Workflow for publishing full-text HTML for Studies in Digital Heritage

This is a new approach (June 5, 2017) that does not require InDesign .


  1. Convert Word docx > XML > HTML for full text publishing in OJS. Commonly referred to as xml-first publishing.
  2. Use standard format for scholarly publishing: NLM/JATS DTD (defines XML elements).


Required software

  1. Python
  2. Java
  3. bash
  4. meTypeset (docx > XML)
    Purpose-built tool to convert from Microsoft Word .docx format to NLM/JATS-XML for scholarly/scientific article typesetting
    1. Usage: " docx <input> <output_folder> [options]"
    2. Installed on OS X machine, required additional installation of lxml and lxslt python packages
  5. Oxygen (XML > xHTML)
    GitHub tools by NCIB/NLM provide a pathway for standardized XSLT transformations.
    1. Usage: transform XML with jats-html.xsl transformation (
    2. Oxygen "transforms" XML using the format defined by the XSL stylesheet
    3. Requires valid XML using NLM/JATS DTD, e.g.,
  6. CSS options
    1. jats-preview.css
    2. sdh-article.css

Note: Using InDesign to export to XML requires thorough knowledge of XML, with the additional burden of configuring XML DTDs within the InDesign environment. 


It will take approximately a half to a full week to revise the meTypeset and jats-html.xsl code to fully account for the elements and style used by Studies in Digital Heritage. Some of these changes are needed to modify the JATS standard for the SDH context as well as fine-tuning the CSS originally devised for SDH. Redoing SDH's template with all the elements defined as Word "styles" will also help (this latter work could be done by a student). 

Both significant pieces of this workflow have solid histories and are emerging as best practice standards digital publishing and preservation, and are included in a PKP/OJS project to incorporate a Word to XML to HTML plugin in OJS 3 (see below). meTypeset, for example, is a fork of the TEI OxGarage ( set of transformations and uses TEI as an interim step in its processing. JATS (Journal Article Tag Suite) is an application of NISO Z39.96-2015, which defines a set of XML elements and attributes for tagging journal articles ( JATS is a continuation of the NLM Archiving and Interchange DTD work begun in 2002 by NCBI.

Eventually, this workflow would work best if the "meTypeset" package were hosted on library servers with a light interface to eliminate command-line requirements.


Eve, Martin Paul. Building a real XML-first (XML-in) workflow for scholarly typesetting; published on July 20, 2015

Garnett A, Alperin JP, Willinsky J. The Public Knowledge Project XML Publishing Service and meTypeset: Don't call it "Yet Another Word-to-JATS Conversion Kit". In: Journal Article Tag Suite Conference (JATS-Con) Proceedings 2015 [Internet]. Bethesda (MD): National Center for Biotechnology Information (US); 2015. Available from:

O'Connor C, Haenel S, Gnanapiragasam A, et al. Building an Automated XML-Based Journal Production Workflow. In: Journal Article Tag Suite Conference (JATS-Con) Proceedings 2015 [Internet]. Bethesda (MD): National Center for Biotechnology Information (US); 2015. Available from:

Most recent JatsCon proceedings available here


Other Notes

1. There is a plugin in development for OJS 3 that may process docx in the future. Some relevant links to track it.

2. Guide and wiki for OJS 3