Child pages
  • TEI to METS Issues and Challenges
Skip to end of metadata
Go to start of metadata

In order to facilitate automatic generation of METS documents for text collections upon ingest into Fedora, we are mapping descriptive and structural information in TEI documents to METS.

General Issues

METS Area

Source

Physical Struct Map (page sequence)

TEI or File system

Logical Struct Map (internal structure

TEI

Actual page numbering

TEI

Derivatives (different image sizes)

File system

Whole documentation representation (e.g., PDF)

File system or Config file

Descriptive metadata

TEI Header and possibly other places like MARC records

  • Need to finish MODS mapping for descriptive metadata, including mapping for issue-level object (Needs input from Jenn)
  • Need to find/create MODS-DC mapping for OAI purposes
  • Need to determine which structural elements in METS can be generated from the TEI itself and create mapping.
    • Jenn and Michelle will work on this mapping
  • Working to develop a process that will use both the structural information from the TEI, as well as from the file system, and will check them against each other (verification step).
    • Help ensure filenaming is accurate and match between TEI and image file names especially when TEI and images aren't auto QCd in batch. If both are available need to have guidelines for what is ingested first: TEI or corresponding images.
  • Ingest process needs to be able to accept existing METS document and add information that can't be pre-generated from the TEI (e.g. image derivatives, PDFs, etc.).
    • Need to generate XSLT at ingest (or pre-ingest) that draws what we need from TEI document, but that can be further manipulated to include information required from non-TEI sources.
      • XSLT needs to be configurable per collection (or new XSLT per collection; hard to generalize)
  • Draft workflow section for preparing text collections for Fedora ingest
    • Determine structural needs (what objects are necessary, which ones will contain what metadata)
      • Consider applications that will be using the collection (e.g. search, OAI, METS Nav, etc.)
    • Customize (if necessary) MODS, DC, and METS mappings
    • Generate MODS and DC records
    • Generate METS files using MODS, DC, and TEI

For the IMH ...

  • See the Journal Content Model page for METS structure
  • We will have Fedora objects representing articles, as well as issues, but the text can be stored at either level. Need to ask David what would be easiest for XTF to handle.
  • METS Documents: for Fedora and METS Navigator
    • METS Navigator: Generate METS document for Issue with pointers to other METS documents for each article (to support article-level pagination and issue-level navigation); pre-generated METS documents
    • Fedora: for managing all the related components of the collection?; auto-generated
  • There are page objects at two different levels - front- and back-matter pages are children of the issue-level object, other pages are children of article-level objects. (See Ryan's note on the Journal Content Model page.)
  • This will need to be addressed in the fileGrp and structMap sections, but should it be duplicated in the issue and the article? How exactly?

Preliminary Mappings

Descriptive Metadata <dmdSec>

  • Map TEI Header to MODS (for repository and OAI)
  • Map MODS to QDC (for OAI and repository)
  • Pointer to TEI Header <mdRef> (for all the other stuff in the header, not captured in MODS)

Administrative Metadata <amdSec>

  • TEI does not contain source copyright statement, only electronic file statement. How do we handle this in MODS versus <rightsMD>?
  • No labels