Lead developer: David Jiao
E-text project manager: Michelle Dalmau
Projects in critical need of check: Indiana Magazine of History, Indiana Authors and Their Books
Script due: 11/3/2007
Documentation: DLP Etext QC Tool
When projects are outsourced to vendors for both digitization and text encoding, the vendors follow strict file naming and ID designation conventions according to our contractual agreement and specialized encoding guidelines.
We need to make sure the number of page breaks <pb> with id attributes (contain file name identifier) in the TEI document correspond to the same number and file name of facsimile TIFF images. This check needs to be run as part of the automatic quality control process conducted by the Digital Media Specialist when files are downloaded from vendor drop boxes.
We also need the ability to re-generate identification numbers should vendors:
The Indiana Magazine of History (IMH) project is in critical need of this check. We hope to develop a process that is also applicable to other e-text projects that fit under this use case scenario. However, we will first need to address the specific needs of the IMH.
Approaches for version 1 of this tool discussed by David Jiao and Michelle Dalmau include:
For this case, sequential identification numbers of the id attribute of the page break <pb id=""> tag should be generated automatically after facsimili TIFF page images have been created.