Child pages
  • Faceted Search

Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

There are no published or popular faceted search extensions for SRU or CQL, so in order to accommodate the needs of DLP collections (IN Harmony, EVIADA, etc) our various collections we must establish our own conventions or context sets.

...

For our first pass implementation, to meet the immediate needs of our IN Harmony designscollection, we must be able to convey the following information in an SRU searchRetrieveRequest along with all of the details about the search:

...

  1. the facets that were calculated (probably mirroring all categories in the order requested)
  2. the facet values for each facet that had at least one hit in the search result set (constrained by the limit set in the request)
  3. the number of hits for each facet value in the result set
  4. Wiki Markup
    the cql query clause that must be ANDed with the original query in order to search for the hits for the given facet value (this iscan simply be derived "\[facetField\]=\[facetValue\]")
  5. A normalized version of the requested facet parameters, to prevent the client from having to maintain that state, and to show what default paging parameters were applied by the server (in the even that they weren't specified).

In SRU/W custom extensions may be created using a limited syntax (SRU 1.2 extension information).

...

Parameter Name

x-iudl-requestFacetInformation

<ac:structured-macro ac:name="unmigrated-wiki-markup" ac:schema-version="1" ac:macro-id="92f8121944edda94-b0a4024e-4a9a4669-95948bd8-6852e5fc1835a367f0344e84"><ac:plain-text-body><![CDATA[

Value syntax

[facetName1],[maxValueCount],[offset] [facetName2],[maxValueCount],[offset]

]]></ac:plain-text-body></ac:structured-macro>

Options

The maxValueCount and offset are optional paging parameters. If excluded, the server decides the paging rules which will be explicitly listed in the response.

Example value

mods.copyrightDate dc.format dc.date,10 mods.genre, 10 modsdc.subject.topic,10 inharmony.instrument mods.recordContentSource,10

Explanation

The first 10 (alphabetically) facet values (with hit counts) are requested for 4 of the 5 named fields. All facet fields (with hit counts) will be returned for the remaining field (inharmony.instrument).  The names of fields are most likely aliases and cannot contain spaces or commas which serve as field and property delimiters respectively A request for all of the represented dc.format values, the first 10 represented dc.date values and the second page (of length 10) of the dc.subject values represented in the search results. (Sort order for paging is left up to the server, but is generally expected to be alphabetical. If sort order is a problem, the client always has the option to request all values, and handle sorting and paging at their end.

Example response

Code Block
xml
xml
<extraResponseData xmlns:ns15ns4="http://www.loc.gov/zing/srw/">
  <ns16<ns5:facetInformation xmlns="http://www.dlib.indiana.edu/xml/sruFacetedSearch/version1.0/"
          xmlns:ns16ns5="http://www.dlib.indiana.edu/xml/sruFacetedSearch/version1.0/">
    <ns16<ns5:field name="modsdc.copyrightDateformat">
      <ns16<ns5:value hits="1">1835</ns16:value>
      <ns16:value hits="1">1848</ns1614421">35mm slide</ns5:value>
      <ns16<ns5:value hits="114421">1851<>image/ns16jpeg</ns5:value>
      <ns16:value hits="2">1852</ns16:value></ns5:field>
      <ns16<ns5:valuefield hitsname="1">1854</ns16:value>dc.date">
      <ns16<ns5:value hits="126">1873<>1938-09</ns16ns5:value>
      <ns16<ns5:value hits="1">1875</ns16>1938-09-01</ns5:value>
      <ns16<ns5:value hits="1">1887</ns165">1938-09-03</ns5:value>
      <ns16<ns5:value hits="1">1888</ns164">1938-09-04</ns5:value>
      <ns16<ns5:value hits="1">1890</ns16>1938-09-09</ns5:value>
    </ns16:field>     <ns16:field name="mods.genre">
      <ns16:<ns5:value hits="8">Ballads</ns1616">1938-09-10</ns5:value>
      <ns16<ns5:value hits="9">Foxtrots</ns167">1938-09-17</ns5:value>
      <ns16<ns5:value hits="4">Gospel music</ns163">1938-09-18</ns5:value>
      <ns16<ns5:value hits="3">Marches</ns16>1938-09-21</ns5:value>
      <ns16<ns5:value hits="4">Ragtime music</ns166">1938-09-22</ns5:value>
      <ns16:value hits="128">Songs</ns16:value>
      <ns16:value hits="128">Songs;</ns16:value>
      <ns16:value hits="15">Waltzes</ns16:value>
      <ns16:value hits="128">songs</ns16:value>
    </ns16:</ns5:field>
    <ns16<ns5:field name="modsdc.subject.topic">
      <ns16<ns5:value hits="1">Adultery</ns16:value>
      <ns16:value hits="1">Christmas</ns16>Abraham Grapheus</ns5:value>
      <ns16<ns5:value hits="2">Courtship<>Abutments</ns16ns5:value>
      <ns16<ns5:value hits="419">Dance<>Acacia</ns16ns5:value>
      <ns16<ns5:value hits="315">Dance music</ns16:value>
      <ns16:value hits="1">Drinking songs</ns16>Acadia National Park (Me.)</ns5:value>
      <ns16<ns5:value hits="1">Flags United States</ns16>Acanthi</ns5:value>
      <ns16<ns5:value hits="12">Flags-- United States</ns16>Acanthopanax ricinifolius</ns5:value>
      <ns16<ns5:value hits="1">Flowers</ns16:value>
      <ns16:value hits="2">Gardens</ns16>Acanthus mollis</ns5:value>
    </ns16:field>     <ns16:field name="inharmony.instrument">
      <ns16:<ns5:value hits="1636">Banjo<>Accidents</ns16ns5:value>
      <ns16<ns5:value hits="2">Banjulele banjo</ns16:value>
      <ns16:value hits="1">Cello<>Accordions</ns16ns5:value>
      <ns16<ns5:value hits="83">Chords</ns16:value>
      <ns16:value hits="1">Flute</ns16:value>
      <ns16:value hits="12">Guitar</ns161">Accreted terrain</ns5:value>
      <ns16:value hits="1">Mandola</ns16:value></ns5:field>
      <ns16<ns5:value hits="49">Other</ns16:value>requestInfo>
       <ns16:value hits="350">Piano</ns16:value>
      <ns16:value hits="334">Piano and Voice</ns16:value><ns5:originalRequest>dc.format dc.date,10 dc.subject,10,10</ns5:originalRequest>
      <ns16<ns5:value hits="334">Piano and Voice;</ns16:value>resolvedRequest>
         <ns16<ns5:valuefacet hitsmaxValues="334-1">Piano and voice</ns16:value>
      <ns16:value hits="83">Piano, Voice and Chords</ns16:value>
      <ns16:value hits="6">Piano, Voice and Ukulele</ns16:value> name="dc.format" offset="0"/>
      <ns16:value hits="2">Tenor Banjo</ns16:value>
      <ns16:value hits="1">Ukelele</ns16:value>
      <ns16:value hits="43">Ukulele</ns16:value>
      <ns16:value hits="1">Violin</ns16:value> <ns5:facet maxValues="10" name="dc.date" offset="0"/>
      <ns16:value hits="336">Voice</ns16:value>
      <ns16:value hits="1">Voice and Piano</ns16:value>
    </ns16:field>
    <ns16:field name="mods.recordContentSource">
      <ns16:value hits="124">Indiana Historical Society</ns16:value> <ns5:facet maxValues="10" name="dc.subject" offset="10"/>
      <ns16:value hits="65">Indiana State Library</ns16:value></ns5:resolvedRequest>
      <ns16:value hits="93">Indiana State Museum</ns16:value>
    </ns16ns5:field>requestInfo>
  </ns16ns5:facetInformation>
</extraResponseData>

Facet Calculation

...

Usage Notes

Some of the specific collection needs of IN Harmony can't be implemented generally enough in the SRW SRU/W server but instead must be implemented by the IN Harmony web application. If a reasonably general way of implementing the needed functionality can be devised, it would be ideal to move it to the SRU/W layer, but for complex cases that aren't reusable, it is a bad precedent to implement the feature in SRU/W.

The following are strong reasons to implement something at the SRU level:

...

Needed Feature

Implementation suggestion

Grouping of 5 year periods in the a copyright date facet.

The web application should request all facet information for copyright date (this will rarely be even 100s of searches and shouldn't kill performance) and then consolidate by half-decade, creating queries by ORing the query fragments for each result.

Knowing when to display a button to request all facets (ie, knowing when there are more, undisplayed results)

The web application should request x + 1 facets for each field where x is the number to display. The (x + 1)th result shouldn't be displayed, but if it's present, a button should exist saying "Display all".