Initially, remove out of scope records.
Basic search implies Google-like functionality, so when basic is noted in the following table, it means the element[s] are part of the basic search index. This index should also be an option in the advanced search page, as "keyword".
Levels of adoption, according to SWG's page:
Item being Processed |
Processing Notes |
XPath Query |
Level of Adoption |
Brief/Full Display |
Record Display Label |
Basic/Advanced Search Index |
In Advanced Search? |
Browse Facet |
---|---|---|---|---|---|---|---|---|
title |
For brief display, clicking on the title hotlinks to the URL. |
indexing:
|
level A |
brief, full |
Title |
basic, advanced |
yes, selectable from drop-down |
no |
date1 |
First, use keyDate, if it exists. Should be one and only one keyDate. @w3cdtf strongly preferred. |
(in order) |
level A |
brief, full |
Date
|
basic, advanced |
yes, choice of single date entry, range, and era/decade |
yes |
language |
UM processing: re-exposed records contain exploded language codes. If there is a @type="code", another sub-element is added with @type="text" that includes the exploded code. If @type="text" already exists, it is left alone. |
mods/language/languageTerm[@type='text'] |
level C |
full |
Language |
advanced |
yes, selectable from drop-down |
yes |
URL |
Two fields can contain clickable URLs: location/url and identifier@uri. For display, only the primary URL in location/url should be used, if available. For brief display, clicking on the title hotlinks to the primary URL. For full display, the URL displays as-is. |
mods/location/url[@usage='primary display'] or mods/identifier[@type='uri'] |
level B |
[brief], full |
URL |
neither |
no |
no |
creator |
Separate name and namePart, affiliation, role or description with a space comma. |
mods/name/namePart|affiliation|description and role/roleTerm[@type='text'] |
level B |
full |
Related Names |
basic, advanced |
yes, selectable from drop-down |
no |
subject2 |
Record display and browse facets are driven by the subject indexes. They should be generated from all subelements of subject, regardless of whether they appeared within a single subject container. Therefore, split pre-coordinated headings (e.g., United States - Social conditions - 1980- - Juvenile literature - Bibliography) into their component parts for indexing and browse display, but not for record display.
|
mods/subject/geographic|hierarchicalGeographic|geographicCode |
level B |
full |
Subject |
basic, advanced |
yes, limiter by subject index type |
yes
|
physical description |
Sub-elements should be separated by a space semicolon space. |
mods/physicalDescription/* |
level C |
full |
Physical Description |
basic |
no |
no |
publisher and place |
placeTerm@code should be exploded as described above for subject/geographicCode, roleTerm and language. |
mods/originInfo/place/placeTerm[@type='text'] and publisher |
level B |
full |
Publisher |
basic (publisher), advanced (publisher and place) |
yes, selectable from drop-down |
no |
origin aspects |
Sub-elements should be separated by a space semicolon space. |
mods/originInfo/edition|issuance|frequency and mods/part/* |
level C |
full |
Publication Specifics |
basic, advanced |
no |
no |
resource type |
Ignore attributes. |
mods/typeOfResource |
level B |
full |
Resource Type |
basic, advanced |
yes, limiter by value |
no |
genre |
Ignore attributes. |
mods/genre |
level B |
full |
Genre |
basic, advanced |
yes, selectable from drop-down |
yes |
location |
Separate multiple instances of physicalLocation by a comma space. |
mods/location/physicalLocation |
level C |
full |
Physical Location |
advanced |
no |
no |
identifiers |
If @type="uri" is used for URL, exclude it here. |
mods/identifier |
level C |
full |
Identifier |
neither |
no |
no |
classification3 |
Ignore attributes, for now. |
mods/classification |
level C |
full |
Classification |
neither |
no |
no |
table of contents |
Ignore attributes. |
mods/tableOfContents |
level C |
full |
Table of Contents |
basic |
no |
no |
abstract |
Ignore attributes. |
mods/abstract |
level C |
full |
Abstract |
basic, advanced |
yes, selectable from drop-down |
no |
note |
Ignore attributes. |
mods/note |
level C |
full |
Note |
basic |
no |
no |
audience |
Ignore attributes. |
mods/targetAudience |
level B |
full |
Audience |
basic, advanced |
no |
yes |
rights |
Ignore attributes. |
mods/accessCondition |
level B |
full |
Terms and Conditions of Use |
neither |
yes, limiter by value |
no |
related item |
Exclude the "dlfaqcoll" attribute here, because used for collection. |
mods/relatedItem/* |
level C |
full |
Related Item |
basic |
no |
no |
preview |
UM processing: re-exposed records contain a thumbnail image in the @access="preview" attribute. If the re-exposed records do not contain a preview image, the Thumbgrabber can be used to gather them. |
mods/location/url[@access='preview'] |
level A |
brief, full |
n/a |
n/a |
n/a |
n/a |
collection |
UM processing: re-exposed records contain the "dlfaqcoll" attribute that concatenates repository name and OAI setName into a readable collection phrase. |
mods/relatedItem/titleInfo[@authority='dlfaqcoll']/title |
level A |
brief, full |
Collection |
basic, advanced |
yes, limiter by collection |
yes |
1 We would recommend including the following in this methodology:
2 Investigate supplementing the time browse facet that contains mods/subject/temporal with data from date elements. Also, investigate using @authority to determine if certain controlled vocabularies (e.g., LCSH) can help us create more consistent subject indexes. If clustering is a possibility, this will also aid this effort.
3 Look into whether classification can supplement genre or subject. For instance, High Level Browse at UM can be used to map classification numbers to a set of topics.
4 Date processing rules per the MWG and the SWG:
for both indexing and sorting:
other indexing rules:
sorting:
display:
MODS fields not used for data processing, although they may be used for other things, are:
Remaining questions: