Skip to end of metadata
Go to start of metadata

You are viewing an old version of this page. View the current version.

Compare with Current View Page History

« Previous Version 9 Next »

The facts as we know them:

  • SRU (Search/Retrieve via URL) is a URL-based system for communication between information retrieval systems. SRU is based on Z39.50, but removes a lot of the old baggage that no one was using. If any of the old stuff is needed, it can be implemented as a separate web service. This is just search and retrieve.
  • SRW (Search/Retreive Web Service) is being renamed "SRU via SOAP".
  • CQL is the query language used by SRU. (Contextual Query Language)
  • All of the above used to fall under the set of standards called ZiNG (Z39.50 International: Next Generation), but they are now all being renamed SRU. (Although the SRU XML objects still use the SRW namespace.)

It is very easy to write basic SRU implementations. SOAP toolkits can use a WSDL description to generate protocol code. Then all we have to
do is implement the single client or server SRU method, which translates your query format into an CQL query (client) or a CLQ query into your database query (server).


CQL allows you to specify the format of the result set (Dublin Core, MARCXML, etc.). All servers must support Dublin Core and the SRUDiagnostics format.

The query can be a list of attribute-value pairs, or a chunk of XML.

CQL makes a distinction between string search (exact match of a string) and keyword search (finding all words of a string somewhere in the document).

Bibliographic (MODS-like) searching

Bibliographic searching is currently at the proposal stage.

SRU Results

The result set can contain state-preserving information, like the original query or an IP address, for use with lightweight clients.

Records in the result set are encoded as strings, with angle brackets escaped. This doesn't work very well for completely browser-based
clients using SRU, since you lose the ability to apply XPath statements to returned records. This problem will be addressed in a
future version.

Servers can/should retain result sets, so they can be referenced later, especially when the client is asking for multiple pages from a
set. A good way to force the server to keep a result set is to "touch" it with the client, and refresh the time.

Misc SRU info

Authentication/encryption is not built into the protocol, and must be addressed at a higher level. (See general web services literature.)

Contacts: Ralph LeVan (, Matthew Dovey

Comparison to other systems:

  • Xquery is based on knowing the structure of the data being searched. It doesn't work well for general-purpose searches.
  • SDLIP is complex, doesn't have a query language
  • DASL is linked to SQL, but pretty good
  • Z39.50 is complex and fragile

OCLC's SRW server

Ralph LeVan at OCLC maintains a Java-based SRU server package (still called SRW). Out of the box, it can connect to OCLC's Pears database and the Lucene indexes created by DSpace. We have modified it to work with our Lucene indexes, and hope to make this generic, so any Lucene indexes may be used.



  • doGet
    • SRWServletInfo reads the config files and sets up properities for all "known" databases.
    • passes off to processMethodRequest
      • SRWServletInfo.setSRWStuff parses the DB out of the request string,
        • SRWDatabase initializes a database object
          • Instantiates the proper class (as listed in SRWServer.props)
          • Reads in properties from the DB's config file
      • looks up some DB details, and sets them as properties of msgContext
    • Builds an SRW SOAP query out of the URL query and invokes it with AxisEngine.invoke()
    • when it returns, strips the SOAP stuff out
  • doPost
    • very similar to doGet, resulting in an AxisEngine.invoke()


  • searchRetrieveOperation is invoked by Axis
  • eventually calls SRWDatabase.doRequest() on the proper database class

Databases derive from SRWDatabaseImpl. This class implements the doRequest() method. It is best not to override this method, as it takes care of caching result sets and other useful administrative stuff. It is best to only override the abstract methods.

  • No labels