Child pages
  • Batch Loading Variations Metadata from MARC

Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.
Comment: Migrated to Confluence 5.3

...

Algorithm for Batch Loading Variations Metadata from MARC

Algorithm - Major Processing Steps
MARC Record Groupings
Collective titles black and white lists
Form list

Anchor
algorithm
algorithm

Algorithm - Major Processing Steps

  1. Contributor Record Creation
    • Check V2 Database for duplicate
    • Query Authority File
    • Create Record and Map from Name Authority Record and/or MARC bib record (Mapping Document)
  2. Work Record Creation
    • Identify using existing work-identification algorithm (Work Identification Algorithm)
    • Check V2 Database for duplicate
    • Query Authority File
    • Create Record and Map from Name/Title Authority Record and MARC bib record (Mapping Document)
  3. Container Record Creation
  4. Instantiation Record Creation (for those works created above)
    • Create Record and Map from Work Record and MARC bib record (Mapping Document )
    • Attach to Container Record
  5. Container Record Mapping (for those with no instantiations attachedsecond pass, to account for instantiations not created)
    • Map from MARC bib record (attach Contributors directly to container) (Mapping Document )

Full Mapping Table

...

Anchor

...

Selective vocabulary of collective titles (derived from report*)

List enhanced with

groupings

...

Data sought

...

High-level question(s)

...

1

...

Contributors created without importing an authority record

...

What will contributor records created "from scratch" look like?

...

2

...

Works identified by algorithm

...

How has the logic of the algorithm improved over past iterations? What is the success rate for identifying true works?

...

3

...

Works created without importing an authority record

...

What will work records created "from scratch" look like?

...

4

...

Works with no contributors

...

How can these works be retrieved in a search? How well do they map from the MARC records? Is there any way to logically match contributors by other means?

...

5

...

Contianers with multiple instantiations generated from one V2 query

...

How effective is the logic used to create these instantiations? What is the success (accuracy) rate for matching instantiations to content based on a collective title "root" match?

...

6

...

Containers with no instantiations

...

How effective is the mapping process when there are no instantiations attached? How can these containers be retrieved in a search?

groupings

MARC Record Groupings

  • Group 1: 0 700 |t
    • Group 1a: 100/240/245
    • Group 1b: 100/245 (no 240)
    • Group 1c: 245 (no 100/240)
  • Group 2: 1 700 |t
  • Group 3: 2 700 |t
  • Group 4: 3+ 700 |t

Anchor
collectivetitles
collectivetitles

Collective titles "black" and "white" lists

List includes selected terms from http://www.library.yale.edu/cataloging/music/musicat.htm#uniformtitles

Does not include account for arbitrary ranges/sets of specific works (e.g. Sonatas, |m piano, |n no. 2-5)

List A - (Collective Titles "black list") Automatically exclude as a work title...

  • Cantatas
  • Chamber music
  • Chansons 
  • Choral music
  • Electronic music 
  • Fantasien
  • Harpsichord music
  • Instrumental music
  • Lute music 
  • Keyboard music 
  • Madrigals 
  • Masses 
  • Motets
  • Musicals
  • Orchestra music 
  • Organ music
  • Overtures
  • Piano music 
  • Selections
  • Sinfoniettas
  • String quartet music
  • Symphonic poems 
  • Symphonies
  • Symphonies, string orchestra
  • Violin, harpsichord music 
  • Violin, piano music 
  • Violoncello, piano music
  • Vocal music
  • Works

...

  • Cantatas
  • Fantasien
  • Madrigals
  • Motets
  • Overtures
  • Symphonies

Anchor
formlist
formlist

Form list

List B - (Forms) - Exclude as a work title under the following circumstances:

  1. If no |m, |n, |p or |r -- Exclude
  2. If |m but no |n, |p or |r -- Exclude in Groups 2-4 if derived from a 240. Do not exclude and flag for human review if derived from 240 (in Group 1a) or from a 700 |t (in Groups 2-4).
  3. Wiki MarkupIf \ |m and/or \ [\|n, \ |p, \ |r\] \ -\- Do not exclude
  4. Note: if singular form (no "s" at end of word, unless one of the exceptions noted below *) -- Do not exclude

...