For information about the current status of OAI in the DLP, see the OAI page.

There are two standard OAI providers for Fedora. One is the built-in OAI provider and the PrOAI. PrOAI is a much more flexible solution. DLP is going to use PrOAI for OAI purposes.

The PrOAI provider

In Fedora 2.1 and on, the old OAI system still works, but the new system based on PrOAI is much more configurable. Of course, this means it is more difficult to configure. The PrOAI system periodically polls Fedora to update its "database" of deliverable records. This "database" is really a simple filesystem living under the proai.cacheBaseDir. The structure of this filesystem (and the associated tables in the relational database) is much like the way regular Fedora objects are organized.

Sample URL: http://fedora-dev.dlib.indiana.edu:8080/oaiprovider/?verb=ListRecords&metadataPrefix=oai_dc

Remaining issues (section last updated 10/04/2010)

Performance issues

Data questions

Setup tips

Make sure you copy the JDBC driver into the OAI provider's lib directory.

The sample JDBC URL in the proai.properties file is misleading; just use the same one you're using for Fedora (assuming you have the proai user set up there).

The cache and sessions directories don't seem to work properly with directories that are on "virtual" drives (from the DOS SUBST command). It is probably easiest to use a relative directory name, starting with

webapps\\oaiprovider

to put it with the rest of the app in Tomcat.

Improper settings in the proai.properties, or manually messing with the cache directory, can sometimes corrupt the database. If the DB points to cached files that don't exist, the query process will seem to go through (no errors), but the result list will be empty. In this case, it is best to drop all of the associated DB tables (they start with "rc") and delete the contents of the cache directory, forcing a full rebuild the next time the OAI provider is started.

It will only attempt to index objects that appear to be oai items. An object appears to be an oai item if it:

  1. Has an RDF property matching the configured driver.fedora.itemID. This ID will be used as the official "OAI ID".
  2. Disseminates one or more of the formats specified in the Proai configuration.

If you want to index/provide objects based on a specific argument to a disseminator (like getMetadata?format=mods), the method must restrict its arguments to a defined set of values, and the Resource Index indexing must be set to level 2.

<rdf:RDF xmlns:fedora="info:fedora/fedora-system:def/relations-external#"
  <!-- oai namespace added -->
  xmlns:oai="http://www.openarchives.org/OAI/2.0/"
  xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"
  xmlns:rdfs="http://www.w3.org/2000/01/rdf-schema#">
  <rdf:Description rdf:about="info:fedora/iudl:25959">
  <!-- oai itemID added -->
    <oai:itemID>oai:oai.dlib.indiana.edu:/inharm/sheetmusic/isl-xxx</oai:itemID>
    <fedora:isMemberOfCollection rdf:resource="info:fedora/iudl:23"></fedora:isMemberOfCollection>
  </rdf:Description>
</rdf:RDF>
<rdf:RDF xmlns:oai="http://www.openarchives.org/OAI/2.0/"
  xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"
  xmlns:rel="info:fedora/fedora-system:def/relations-external#">
  <rdf:Description rdf:about="info:fedora/iudl:23">
    <!-- oai setSpec and setName added -->
    <oai:setSpec>isl</oai:setSpec>
    <oai:setName>InHarm - ISL collection</oai:setName>
  </rdf:Description>
</rdf:RDF>

See OAI Data Provider Requirements for some more setup tips.

Note about Metadata validation

PROAI can validate XML during the indexing process. But there is no documentation on how it is configured and several attempts at indexing with validation turned on failed. I'm temporarily turning off validation. After we figure out how it is configured we might want to turn it back on.

Indexing process

When proai.service.ProviderServlet starts,

  1. It loads the proai.properties file
  2. It creates a Responder based on the properties, which
    1. Creates a RecordCache, which
      1. Initializes the OAIDriver specified in the properties file (in this case the FedoraOAIDriver)
      2. Starts a thread that periodically runs OAIDriver.listRecords
        1. FedoraOAIProvider.listRecords passes off to
          1. ITQLQueryFactor.listRecords, which
            1. sends date-constrained queries to the Fedora Resource Index.
            2. creates a CombinerRecordIterator, which
              1. creates a FedoraRecord for each result item
      3. calls FedoraOAIProvider.writeRecordXML
        1. This method name has been changed through a different version of PrOAI, but it almost definitely hooks up to FedoraRecord.writeMetadata

Query process

Queries to http://localhost:8080/oaiprovider/? go through proai.service.ProviderServlet.

A sample query for ListRecords:

  1. proai.service.ProviderServlet gets verb=ListRecords&metadataPrefix=oai_dc
  2. ProviderServlet.doGet passes to Responder.listRecords
    1. which passes to SessionManager.list
      1. which spawns a new CacheSession (a thread), and returns its getResponseData (this method blocks until the thread has completed)
        1. fills output files (under the session directory, one file per response page) with the paths of cached metadata files that meet the criteria

The Fedora-based OAI provider

In Fedora 2.0, OAI export is built in, and works without any additional configuration.

For example, see:

However, this system has two major drawbacks:

Moving Existing Collections into Fedora OAI

This page lists what needs to be fixed before moving the existing collections into Fedora OAI.