The eXtensible Text Framework (XTF), developed by the California Digital Library, is essentially a wrapper around Lucene that provides some functionality for handling XML and standard digital library formats.

XTF has been adopted to deliver text-based collections at DLP. The IU Board of Trustees Minutes and IU Finding Aids are currently supported by XTF (See Collections delivered with XTF).

There is a test version of XTF installed on rhyme (sample query: apartheid), and a test version being used for Newton.

It has a nice architecture with three main modules:

  • Indexing (textIndexer)
    • A command-line tool that initiates Lucene indexing of files in a given directory.
    • Can use custom XSLT both to select which documents get indexed and to pre-process documents for indexing.
    • Automatically detects which documents have been changed to perform incremental updates
  • Query processing (crossQuery)
    • Can use custom XSLT both to transform the query and to render the result list.
    • Has a very simple native query language, but also supports SRU/CQL.
  • Document rendering (dynaXML)
    • Can use custom XSLT to transform document ID numbers into file locations, and to render the resultant files.

It is unclear how powerful the query processing module is; we may need to beef it up a bit.

They have tied their implementation to a particular version of Lucene, by making some modifications to the Lucene code that have not been merged into the primary CVS. Not a big problem, but we would want to keep an external version of Lucene if we need searching capabilities that XTF cannot handle.

