There are no published or popular faceted search extensions for SRU or CQL, so in order to accommodate the needs of DLP collections (IN Harmony, EVIADA, etc) our various collections we must establish our own conventions or context sets.
For our first pass implementation, to meet the immediate needs of our IN Harmony designscollection, we must be able to convey the following information in an SRU searchRetrieveRequest along with all of the details about the search:
- the facets that were calculated (probably mirroring all categories in the order requested)
- the facet values for each facet that had at least one hit in the search result set (constrained by the limit set in the request)
- the number of hits for each facet value in the result set
the cql query clause that must be ANDed with the original query in order to search for the hits for the given facet value (this iscan simply be derived "\[facetField\]=\[facetValue\]")
- A normalized version of the requested facet parameters, to prevent the client from having to maintain that state, and to show what default paging parameters were applied by the server (in the even that they weren't specified).
In SRU/W custom extensions may be created using a limited syntax (SRU 1.2 extension information).
<ac:structured-macro ac:name="unmigrated-wiki-markup" ac:schema-version="1" ac:macro-id="92f8121944edda94-b0a4024e-4a9a4669-95948bd8-6852e5fc1835a367f0344e84"><ac:plain-text-body><![CDATA[
The maxValueCount and offset are optional paging parameters. If excluded, the server decides the paging rules which will be explicitly listed in the response.
mods.copyrightDate dc.format dc.date,10 mods.genre, 10 modsdc.subject.topic,10 inharmony.instrument mods.recordContentSource,10
The first 10 (alphabetically) facet values (with hit counts) are requested for 4 of the 5 named fields. All facet fields (with hit counts) will be returned for the remaining field (inharmony.instrument). The names of fields are most likely aliases and cannot contain spaces or commas which serve as field and property delimiters respectively A request for all of the represented dc.format values, the first 10 represented dc.date values and the second page (of length 10) of the dc.subject values represented in the search results. (Sort order for paging is left up to the server, but is generally expected to be alphabetical. If sort order is a problem, the client always has the option to request all values, and handle sorting and paging at their end.
Some of the specific collection needs of IN Harmony can't be implemented generally enough in the SRW SRU/W server but instead must be implemented by the IN Harmony web application. If a reasonably general way of implementing the needed functionality can be devised, it would be ideal to move it to the SRU/W layer, but for complex cases that aren't reusable, it is a bad precedent to implement the feature in SRU/W.
The following are strong reasons to implement something at the SRU level:
Grouping of 5 year periods in the a copyright date facet.
The web application should request all facet information for copyright date (this will rarely be even 100s of searches and shouldn't kill performance) and then consolidate by half-decade, creating queries by ORing the query fragments for each result.
Knowing when to display a button to request all facets (ie, knowing when there are more, undisplayed results)
The web application should request x + 1 facets for each field where x is the number to display. The (x + 1)th result shouldn't be displayed, but if it's present, a button should exist saying "Display all".