Child pages
  • Schematron
Skip to end of metadata
Go to start of metadata

Schematron

Schematron provides XML content authors with a convenient way to check the quality of their XML data, to ensure that in addition to being syntactically correct, the file meets guidelines for use established in a project. It is often used in conjunction with DTD/Schema validation.

Schematron contains 6 basic elements:

<schema xmlns="http://purl.oclc.org/dsdl/schematron">

  • contains optional <title>
  • zero or more <ns prefix="[]" uri="[]" /> declaring the namespaces and prefixes used for the XPaths
  • One or more <pattern> elements, which contain logical groupings of associated rules
  • One or more <rule> elements in which the context attribute identifies via an XPath expression, the node within a tree to apply tests.
  • One or more <assert> elements in which the test attribute is an XPath expression, and which contains rich text expressing the statement being asserted in plain language
  • One or more <report> elements in which the test attribute is an XPath expression, and which contains rich text expressing the fact to be reported in plain language.

The Schematron Workflow

Getting Started: Install the Schematron plugin

The plugin requires Oxygen version 7.1 or higher (NOTE: There are reports of this plugin not working on some platforms running Oxygen 8.1). Oxygen can be downloaded from: http://www.oxygenxml.com and is also freely available to all IU affiliates via IUWare.

  1. Close Oxygen
  2. Download the file: Validator-jdk14.zip
  3. Unzip the file. This will create a folder called Validator somewhere on your computer. Copy the entire folder into the "plugins" directory of your Oxygen Installation.

Note: If you encounter a problem installing the plugins, double-check the DLP System Documentation page for the most recent version of the plugins and plugin documentation.

Using the Schematron plugin to validate your document

If you are creating a new Schematron validator, you will need to make sure a repository has been created on Algernon, and follow the steps below to create a validator file before you can use the plugin. Contact David Jiao for help with repository setup.

  1. Open an instance of the document to be validated (e.g. an issue of the Indiana Magazine of History).
  2. Right click (or ctrl-click) anywhere within the main window and select 'Plugins' and 'XTF Validator'.
  3. In the 'Choose Repository' window, select the appropriate option and hit 'OK'.
  4. Save the resulting report as an .html file and open it to see any errors in the source document.
  5. Make any changes to the source document, save and repeat.

Creating a new validator

Encode markup usage guidelines in a Schematron file (.sch). This file should contain a combination of patterns, rules assertions and reports that examine and validate the use of tags in instance files. (Note: We have had some problems with reports, so use assertions whenever possible.)

For example, to test that all abbr tags in a TEI file include an expan attribute, a Schematron file might include the following:

If the source markup is schema-based, then the schema namespace (e.g., TEI P5) needs to be declared in both the schematron root element and the resulting schematron.xsl:

The namespace prefix will need to be added to any elements declared in the schematron SCH file:

See the Schematron web site for more instructions on creating Schematron documents. Also see the sample documents attached, including IMH_issue_schematron.sch and INAuthorsEncyclopediaSchematron.sch.

Note about these transformations

These transformations are using a namespace that looks incorrect (http://www/dlib.indiana.edu) but that is what is expected through the plugin and on Xubmit. Keep this namespace as it is and customized checks for new repositories should work on Xubmit and in Oxygen. This should be corrected when plugin code can be modified directly on server.

Configure Transformation Scenario

Before you use your first validator, you will need to download schematron-report-xml.xsl and configure a transformation scenario. Once this is done, you can use the same scenario for all of your Schematron instances.

  1. From the Document menu, choose 'Transformation', and 'Configure Transformation Scenario'.
  2. In the window that appears, click the 'New' button.
  3. In the 'Edit Scenario' window, give the scenario a name that you will remember (e.g. Create Schematron XSLT) and use the folder icon next to the XSL URL field to locate the schematron-report-xml.xsl file on your computer. Make sure the 'Transformer' setting is Saxon-HE 9.5.1.5.
  4. Select 'Output' on the top menu and make sure that 'Save As' is selected.
  5. Use the folder icon next to the 'Save As' field to instruct the processor to save the resulting XSLT in your working folder as "schematron.xsl". (Each time you repeat the transformation, you will overwrite the previous schematron.xsl.)
  6. Hit 'OK' in both windows to exit Configure Transformation Scenario.
  7. Download skeleton1-5.xsl and save it to the same directory as schematron-report-xml.xsl.

Transform your Schematron document into an XSLT validator

You will need to do this each time you make changes to your Schematron file.

  1. Open your Schematron document
  2. From the Document menu, choose 'Transformation', and 'Configure Transformation Scenario'. Make sure the appropriate scenario is selected and hit 'OK'.
  3. Again from the Document menu, choose 'Transformation' and 'Apply Transformation Scenario'. If you are repeating this step, Oxygen will ask you if you want to overwrite the existing schematron.xsl file. You do.
    1. If your source document relies on a schema (e.g., TEI P5), you'll need to make sure the appropriate schema namespace is declared in the rool of the schematron.xsl file.
  4. Connect to current location for Schematron validation in Xubmit and navigate to /opt/xubmit/repositories and then to the appropriate repository and the schematron file within it.
  5. Upload the "schematron.xsl" file
  6. Once this is done, you are ready to use the Schematron plugin (see above) to validate your document or test your Schematron assertions. (Hint: Test your assertions one at a time by 'breaking' something in the source file that the assertion should catch and using the plugin to validate.)

(Optional) Command line to Transform your Schematron document into an XSLT validator

Instead of using Oxygen, there is another way to transform the Schematron document by command line. (warning) For ordinary users, this is not suggested. This has been tested with saxonb-9.0

Or if you have saxonb-xslt installed, simply

Validate an XML file with Schematron in Oxygen

As an alternative to uploading schematron.xsl to gouda/algernon in the previous step, you can generate the HTML report directly in Oxygen through another series of stylesheets. This is the sequence:

XML instance document -> resolvens.xsl (if the file uses namespaces) -> schematron.xsl from the last step -> schematron-xml2html-report.xsl

This will generate a .html report.

Schematron Hints

  • You can only match an element once in a pattern. For example, if I write one rule where context="div" and one where context="div(@type='filingLetter'), they need to go in separate patterns or the second one will be ignored.
  • If you are telling it to look for an attribute and it doesn't notice when that attribute isn't present, check the DTD to make sure there isn't a default value for the attribute.

Schematron and the DLP

The DLP is using Schematron in a number of ways, but primarily through Xubmit, our XML documents ingest, validation and versioning tool.

Projects using Schematron include:

rs, the Newton project currently uses the ISO version file and the XSLT 2.0 transformations which are housed in /newton/tools/xubmit/schematron

Schematron Resources

Online resources

Schematron website: http://www.schematron.com/

Dave Pawson's tutorial: http://www.dpawson.co.uk/schematron/

Additional tutorials listed here: http://www.xfront.com/schematron/index.html

The Women Writers Project teaches a 3-day workshop on TEI customization, of which 1/2 day is spent on Schematron. Slides from the Schematron portion are available online
http://www.wwp.brown.edu/encoding/workshops/cust_2010-08/presentations/schematron.xml
or
http://www.wwp.brown.edu/encoding/workshops/cust_2010-08/presentations/html/schematron_00.xhtml

Courses

Mulberry Technologies Schematron Course: http://www.mulberrytech.com/services/classes/c-schematron-intro.html

  • No labels