The eXtensible Text Framework (XTF), developed by the California Digital Library, is essentially a wrapper around Lucene that provides some functionality for handling XML and standard digital library formats.
XTF has been adopted to deliver text-based collections at DLP. The IU Board of Trustees Minutes and IU Finding Aids are currently supported by XTF (See Collections delivered with XTF).
There is a test version of XTF installed on rhyme (sample query: apartheid), and a test version being used for Newton.
It has a nice architecture with three main modules:
- Indexing (textIndexer)
- A command-line tool that initiates Lucene indexing of files in a given directory.
- Can use custom XSLT both to select which documents get indexed and to pre-process documents for indexing.
- Automatically detects which documents have been changed to perform incremental updates
- Query processing (crossQuery)
- Can use custom XSLT both to transform the query and to render the result list.
- Has a very simple native query language, but also supports SRU/CQL.
- Document rendering (dynaXML)
- Can use custom XSLT to transform document ID numbers into file locations, and to render the resultant files.
It is unclear how powerful the query processing module is; we may need to beef it up a bit.
They have tied their implementation to a particular version of Lucene, by making some modifications to the Lucene code that have not been merged into the primary CVS. Not a big problem, but we would want to keep an external version of Lucene if we need searching capabilities that XTF cannot handle.