Skip to end of metadata
Go to start of metadata

Outstanding VWWP TEI Header Issues

As of 11/8/2011, most of these issues with the exception of reconciling headers (as documented in the VWWP Encoding Log) have been resolved.



Does the TEI header encoding as mapped from MARC provide enough information for the site to sort titles without initial articles?

Randall confirmed that to deal with the legacy content we'll need the site to guess the sort title from the actual title text. Therefore we'll take this approach in the short term for this project. This may not be an effective long-term strategy, however, as when sites start including titles in multiple languages this kind of automated sorting becomes difficult if not impossible. The TEI in Libraries Best Practices Group is planning to add a recommendation for encoding a filing title in its best practices document. Update: We will rely on the XTF sorting for now, but will add another title field in fileDesc with a filing type per the Best Practices:

filing (used for a version of the title with initial articles removed, to be used for sorting titles alphabetically but not for display)

Are the dates in the TEI header mapped from MARC "clean" enough to provide the date searching/faceted browsing implied by the wireframes?

Currently, the MARCXML->TEI header stylesheet simply removes all characters that aren't digits from the date, so "1901." and "[1901?]" both become 1901. I'll assume that's OK. We do run into a problem, however, with dates where a full year wasn't able to be supplied by the cataloger, for example "[188-?]" (which actually appears in one of our newly-digitized books, AFN1996BB). The current stylesheet puts this in the TEI as 188, which obviously doesn't provide the user experience the wireframes suggest. However, it's unclear what the user experience should be in these cases - what date should show to the user, and what TEI encoding should be under the hood to support that display? We'll go ahead and create the TEI shells for the initial five sample volumes for test encoding, but a decision on what the user interaction on the site in these cases should be needs to be made by Michelle upon her return to work before we produce TEI shells for all recently digitized volumes.

Are the author names in the TEI header mapped from MARC the right type to provide the name searching/faceted browsing implied by the wireframes?

The name forms in the wireframes have different punctuation than authoritative names in AACR2 form that appear in the MARC records do. For example, in the wireframes, and author's dates are in parentheses, whereas in the MARC records they are not, and these dates do not appear in all authors' headings as they do in the wireframes. I'm assuming for now that the intention is to just use the authoritative form of name from the MARC record as-is, that trying to meet the syntax in the wireframes is not a high value activity. (If this is necessary, more specification will be required as there are other pieces of data that accompany names in MARC records that are not included in the current wireframes.) We'll move forward with creating TEI shells for test encoding with this assumption, and will need Michelle to confirm this approach once she returns. During this investigation I also found spacing in names was not being handled well, and all instead of the appropriate subset of MARC fields were being mapped into TEI. This has been fixed in the VWWP header stylesheet and the generic MARCXML->TEI header stylesheet. Update: Names should display however they appear from the source MARC record. Naive data trumps the wireframes, which are mostly illustrative anyway not exact representations.

Need to update headers in books for test encoding with changes to the template made after 3/2/10.

TEI shells for five books for test encoding were generated on 3/2/10; however, some changes to the header template were made after that date. Need to manually reconcile the changes made to the template in these test encoding files. To be done by Michelle when she returns to work.

Multiple unqualified titles causing implementation problems in XTF

(warning) In some cases, there are multiple title children for /TEI/teiHeader/fileDesc/sourceDesc/biblFull/titleStmt/title. Example

    <title>The Why I Ams:</title>
    <title>Why I Am a Communist <name>by William Morris;</name></title>
    <title>Why I Am an Expropriationist <name>by L.S. Bevington</name></title>
       <resp>by </resp>
       <name>Louisa Sarah Bevington</name>

Update: We will fix this per the workflow defined here: add another title field in fileDesc with a filing type

  • No labels