For our various encoding projects (EAD and TEI), we need an extra, semantic layer of QC that schematron can not support.
We are badly in need of a tool that will minimize/eliminate the need to correct semantic encoding by hand, one page at a time. For
one particular vendor-encoded project, place name encoding is ATROCIOUS. If we were to fix every place name instance per page,
for this project, we estimated the work to take 55 weeks at 40 hours a week or nearly $20,000, which we can't afford obviously.
Rather than "eyeball" every page, it would be nice if we could generate an XSLT, for example, that looks for encoded place names and grabs surrounding words for context so we can more easily hone in on problem place names. Each place name instance should point back to the original TEI document for easy fixing.
This XSLT or perhaps a more sophisticated QC tool is something that we can use on any encoding project to check any kind of tag use. The tool needs to be highly configurable to check tags in context with a mechanism to easily jump to the exact place within the file for further inspection/correction.
Indiana Magazine of History
Another aspect to this that will speed up the clean up process but potentially complicate the XSLT process, is to only retrieve place names in Indiana. Melanie compiled an authoritative lists that can be used to check against. At the very least (and most), if we can clean up Indiana-related place names we'd be in good shape.
Other Work in this Area
- Brown University's Scholarly Technology Group's creation of Inspector Clouseau (Julia Flanders, Syd Bauman, etc.)
- King's College, London