Child pages
  • IngestTool Control Data

Versions Compared


  • This line was added.
  • This line was removed.
  • Formatting was changed.
Comment: Migrated to Confluence 4.0


Code Block
            <cc:dataReference source="mods" />
            <cc:defaultDataReference action="text">[unknown]</cc:defaultDataReference>

Wiki MarkupThe title block data for each item is stored in the \ [title\] placeholder. Please see cc:idformat, below, for more information.

  • /cc:collectionConfiguration/cc:items/cc:title – This defines where to find the title for any particular item.
    • /cc:collectionConfiguration/cc:items/cc:title/cc:dataReference – This is the same as /cc:collectionConfiguration/cc:items/cc:itemID/cc:dataReference, but deals with the title instead of the ID.
    • /cc:collectionConfiguration/cc:items/cc:title/cc:defaultDataReference – This is the same as /cc:collectionConfiguration/cc:items/cc:itemID/cc:defaultDataReference, but deals with title instead of ID.
    • /cc:collectionConfiguration/cc:items/cc:title/cc:defaultDataReference/@action – Must be one of the following:
      • text – the specified content will be used as the title when the title is not found by the dataReference specifications.
      • skip – the item will be skipped when the title is not found by the dataReference specifications.


Code Block
        <cc:metadataItem type="mods" authoritative="true" level="item">
            <cc:source location="localfs">C:\Documents and Settings\erpeters\My 
            <cc:idFormat regexp="^([^\-]*)\-(.*)\-([^\-]*)$">
            <cc:datalookup key="title">/mods:modsCollection/mods:mods/mods:titleInfo[not(@type)]/mods:title</cc:datalookup>

Wiki MarkupThe ID number is retrieved using a hardcoded XPath in dlib.metadata.Mods, then parsed by the regexp above and the (matching) group values are saved in a hashtable with the key being \ [institution\] for match 1, \ [score\] for match 2, and \ [copy\] for match 3. Warnings are thrown if a match count doesn't equal the number of mappings. These data mappings are then used for data lookup, derivative name generation, or other (currently undefined) locations. This is currently working but has not been fully developed into the full vision I had for it.

I found that the metadata is not going to be consistent across collections, even the collections that we are going over specifically with the infrastructure project in mind, subtle variations occur. For example, many collections use a single /mods:modsCollection/mods:mods/mods:titleInfo/mods:title entry as the title, so this is the default cc:datalookup for title in the MODS. However, with the addition of IN Harmony, there are multiple ...mods:title references, and the main title is the one that doesn't have a (sub) type attribute. So, while this variant title xpath could be used as a new default value, I saw the very real likeliness that various collections will have different metadata formats as time goes on. Therefore I created a mechanism and template for overriding some of the xpath expressions used for data lookup.