Schematron
Schematron provides XML content authors with a convenient way to check the quality of their XML data, to ensure that in addition to being syntactically correct, the file meets guidelines for use established in a project. It is often used in conjunction with DTD/Schema validation.
Schematron contains 6 basic elements:
<schema xmlns="http://purl.oclc.org/dsdl/schematron">
- contains optional <title>
- zero or more <ns prefix="[]" uri="[]" /> declaring the namespaces and prefixes used for the XPaths
- One or more <pattern> elements, which contain logical groupings of associated rules
- One or more <rule> elements in which the context attribute identifies via an XPath expression, the node within a tree to apply tests.
- One or more <assert> elements in which the
test
attribute is an XPath expression, and which contains rich text expressing the statement being asserted in plain language - One or more <report> elements in which the test attribute is an XPath expression, and which contains rich text expressing the fact to be reported in plain language.
The Schematron Workflow
Getting Started: Install the Schematron plugin
The plugin requires Oxygen version 7.1 or higher (NOTE: There are reports of this plugin not working on some platforms running Oxygen 8.1). Oxygen can be downloaded from: http://www.oxygenxml.com and is also freely available to all IU affiliates via IUWare.
- Close Oxygen
- Download the file: Validator-jdk14.zip
- Unzip the file. This will create a folder called Validator somewhere on your computer. Copy the entire folder into the "plugins" directory of your Oxygen Installation.
On Windows: C:\\Program Files\oxygen\plugins\Previewer or C:\\Program Files\oxygen\plugins\Validator On Mac: /Applications/oxygen/plugins/Previewer or /Applications/oxygen/plugins/Validator
Note: If you encounter a problem installing the plugins, double-check the DLP System Documentation page for the most recent version of the plugins and plugin documentation.
Using the Schematron plugin to validate your document
If you are creating a new Schematron validator, you will need to make sure a repository has been created on Algernon, and follow the steps below to create a validator file before you can use the plugin. Contact David Jiao for help with repository setup.
- Open an instance of the document to be validated (e.g. an issue of the Indiana Magazine of History).
- Right click (or ctrl-click) anywhere within the main window and select 'Plugins' and 'XTF Validator'.
- In the 'Choose Repository' window, select the appropriate option and hit 'OK'.
- Save the resulting report as an .html file and open it to see any errors in the source document.
- Make any changes to the source document, save and repeat.
Creating a new validator
Encode markup usage guidelines in a Schematron file (.sch). This file should contain a combination of patterns
, rules
assertions
and reports
that examine and validate the use of tags in instance files. (Note: We have had some problems with reports, so use assertions whenever possible.)
For example, to test that all abbr
tags in a TEI file include an expan
attribute, a Schematron file might include the following:
<sch:pattern id="editorial"> <sch:title>Transcription Checks</sch:title> <sch:rule context="/TEI.2/text/body//abbr"> <sch:assert icon="#dLevel-Warning" test="@expan">Abbr elements should have an @expan attribute.</sch:assert> </sch:rule> </sch:pattern>
If the source markup is schema-based, then the schema namespace (e.g., TEI P5) needs to be declared in both the schematron root element and the resulting schematron.xsl:
<!--Example of TEI schema referenced in schematron SCH file--> <schema xmlns="http://purl.oclc.org/dsdl/schematron" xmlns:tei="http://www.tei-c.org/ns/1.0">
<!--Example of TEI schema referenced in schematron XSL file--> <xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform" xmlns:sch="http://purl.oclc.org/dsdl/schematron" version="2.0" xmlns:tei="http://www.tei-c.org/ns/1.0">
The namespace prefix will need to be added to any elements declared in the schematron SCH file:
<pattern id="p3"> <title>TEI Header/fileDesc</title> <p>This pattern tests elements in the TEI Header/fileDesc.</p> <rule id="r3" context="tei:teiHeader/tei:fileDesc/tei:titleStmt/tei:title"> <assert role="M" test="normalize-space(.) = 'The I Witness'" >titleStmt/title must be "The I Witness"</assert> <report test="normalize-space(.) = 'The I Witness'" >It works!</report> </rule> </pattern>
See the Schematron web site for more instructions on creating Schematron documents. Also see the sample documents attached, including IMH_issue_schematron.sch and INAuthorsEncyclopediaSchematron.sch.
Note about these transformations
These transformations are using a namespace that looks incorrect (http://www/dlib.indiana.edu) but that is what is expected through the plugin and on Xubmit. Keep this namespace as it is and customized checks for new repositories should work on Xubmit and in Oxygen. This should be corrected when plugin code can be modified directly on server.
Transform your Schematron document into an XSLT validator
You will need to do this each time you make changes to your Schematron file.
- Download schematron-report-xml.xsl and skeleton1-5.xsl; save files to the same directory
- Open your Schematron document
- From the Document menu, choose 'Transformation', and 'Configure Transformation Scenario'
Configure Transformation Scenario
- In the window that appears, click the 'New' button.
- In the 'Edit Scenario' window, make sure the Schematron file is the XML URL and use the folder icon next to the XSL URL field to locate the schematron-report-xml.xsl file on your computer. Make sure the 'Transformer' setting is at least Saxon-HE 9.5.1.5.
- Select 'Output' on the top menu and make sure that 'Save As' is selected.
- Use the folder icon next to the 'Save As' field to instruct the processor to save the resulting XSLT in your working folder as "schematron.xsl". (Each time you repeat the transformation, you will overwrite the previous schematron.xsl.)
- Hit 'OK' in both windows to exit Configure Transformation Scenario; If you use this same transformation scenario again, Oxygen will ask you if you want to overwrite the existing schematron.xsl file. You do.
6. If your source document relies on a schema (e.g., TEI P5), you'll need to make sure the appropriate schema namespace is declared in the root of the schematron.xsl file.
7. Connect to current location for Schematron validation in Xubmit and navigate to /opt/xubmit/repositories and then to the appropriate repository and the schematron file within it.
8. Upload the "schematron.xsl" file
9. Once this is done, you are ready to use the Schematron plugin (see above) to validate your document or test your Schematron assertions. (Hint: Test your assertions one at a time by 'breaking' something in the source file that the assertion should catch and using the plugin to validate.)
(Optional) Command line to Transform your Schematron document into an XSLT validator
Instead of using Oxygen, there is another way to transform the Schematron document by command line. For ordinary users, this is not suggested. This has been tested with saxonb-9.0
java net.sf.saxon.Transform -ext:on -l:on -s:schematron.sch -xsl:schematron-report-xml.xsl -o:schematron.xsl
Or if you have saxonb-xslt installed, simply
saxonb-xslt -ext:on -l:on -s:schematron.sch -xsl:schematron-report-xml.xsl -o:schematron.xsl
Validate an XML file with Schematron in Oxygen
You can generate the HTML report directly in Oxygen through another series of stylesheets. This is the sequence:
- Download schematron-report-xml.xsl and skeleton1-5.xsl; save in the same directory
- If necessary, follow the steps listed above to create a new Schematron document; otherwise, can make any edits to pre-existing Schematron document
- Transform the Schematron document (.sch) with schematron-report-xml.xsl; save the result as schematron.xsl
- Transform an XML instance document with resolvens.xsl; save the result as an XML file
- resolvens.xsl only needs to be used for collections with certain namespaces
- if processing the file with resolvens.xsl causes errors that inhibit steps 5-7, try skipping this step (step 4)
- resolvens.xsl only needs to be used for collections with certain namespaces
- Transform the new XML file with schematron.xsl; save the result as an XML file
- Transform the new XML file with schematron-xml2html-report.xsl; save the result as an HTML file
- Open the HTML file in browser
If necessary to make more changes to Schematron document, must repeat steps 3-7
When satisfied with validation results, upload edited Schematron document and schematron.xsl to correct location for specific repository
Schematron Hints
- You can only match an element once in a pattern. For example, if I write one rule where context="div" and one where context="div(@type='filingLetter'), they need to go in separate patterns or the second one will be ignored.
- If you are telling it to look for an attribute and it doesn't notice when that attribute isn't present, check the DTD to make sure there isn't a default value for the attribute.
Schematron and the DLP
The DLP is using Schematron in a number of ways, but primarily through Xubmit, our XML documents ingest, validation and versioning tool.
Projects using Schematron include:
- EAD Working Group (see documentation)
- Chymistry of Isaac Newton (see documentation)
- Aquifer Project (see documentation)
- Indiana Magazine of History (see documentation)
- Indiana Authors (see documentation)
rs, the Newton project currently uses the ISO version file and the XSLT 2.0 transformations which are housed in /newton/tools/xubmit/schematron
Schematron Resources
Online resources
Schematron website: http://www.schematron.com/
Dave Pawson's tutorial: http://www.dpawson.co.uk/schematron/
Additional tutorials listed here: http://www.xfront.com/schematron/index.html
The Women Writers Project teaches a 3-day workshop on TEI customization, of which 1/2 day is spent on Schematron. Slides from the Schematron portion are available online
http://www.wwp.brown.edu/encoding/workshops/cust_2010-08/presentations/schematron.xml
or
http://www.wwp.brown.edu/encoding/workshops/cust_2010-08/presentations/html/schematron_00.xhtml
Courses
Mulberry Technologies Schematron Course: http://www.mulberrytech.com/services/classes/c-schematron-intro.html