Current Identifier Standards

In the DLP, we use semantically-rich identifiers (see differences between identifiers with and without semantics below). There are many types of identifiers in use:

Misc Identifier Notes

We may eventually assign a DOI for each item, but we don't have a need to do that yet.

Some older collections use the shortItemID instead of the fullItemID in their OAI records. For example, in Hohenberger, we have Hoh007.013.0016 instead of lilly/hohenberger/Hoh007.013.0016. We will have to support the old forms of the ID.

The old NOTIS system generated IDs consisting of three letters and four numbers (like VAA1234). To easily connect with IUCAT, we have continued using IDs of this form for IU holdings that do not belong to a special collection. Items that use these IDs always include them as the "identifier" portion of the filename. These identifiers are unique across all DLP holdings. We would hope they are unique across the IU library system, but we have no way of knowing if other groups have adopted conflicting standards that emulate the old NOTIS system.

To generate new NOTIS-form identifiers, use the script at which keeps track of numbers in /usr/local/variations/lastFileName.txt. It is possible to assign a batch of numbers by simply increasing the value in lastFileName.txt, though you should be careful to look at /usr/local/variations/lastFileNameLog.txt to see whether any IDs were generated for other users around the time you were modifying the file.

Semantics in identifiers

Semantic IDs have "meaning" in at least some portion of the ID.



Opaqueness in identifiers

Opaque IDs have no apparent meaning.



NOID (Nice Opaque IDentifier)

NOID avoids the use of vowels, to prevent the unintentional creation of words (which may be misleading, as in a Variations ID like bad6666). It also avoids the use of lowercase 'l', to minimize confusion with the number 1.

The noid program allows creation of an unlimited number of "minters", which can generate IDs of differing types. IDs may consist of:

If a minter is set to produce IDs with a fixed number of digits, it may be set to generate the IDs in a random order, otherwise they are generated sequentially.

Identifier resolution systems

PURL: Persistent URL. Simply a URL redirect, which an institution plans to maintain in a working state indefinitely. PURLs can always be resolved by a web browser.

ARK: Archival Resource Key. An identifier with the form ark:/NAME_GENERATING_AUTHORITY/NAME. More commonly, a URL of the form http://NAME_MAPPING_SERVICE/ark:/NAME_GENERATING_AUTHORITY/NAME. The name mapping service is replacable, and there is a system for looking up a new service if an old one is non-functional. Eventually, ARK hopes to drop everything before the "ark:", and have the name mapping service by dynamic, but it is included for now so that web browsers can support name resolution. The URL can be appended with "?" to retrieve metadata about the object, and with "??" to retrieve a commitment statement. Note: Even though CDL primarily relys on ARK, they use PURL for items they do not control, because they don't want to adhere to an ARK persistence statement for these items.

DOI/handle: Digital Object Identifier. An identifier of the form NAME_GENERATING_AUTHORITY/NAME, but sometimes seen written as a URI (with "doi:"). DOIs are usually associated with a resolver via a URL (like "").

URI: Uniform Resource Identifier. A string, followed by a colon, followed by a string. The initial string should be registered with IANA. Web browsers internally support resolution of a subset of URIs. All URLs are URIs. ARKs and DOIs have not been recognized as standard URIs yet. Of course, the URL form of an ARK is a URI. ARKs may also be referenced through the "info:" URI.

There is much similarity between PURLs and Handles. MIT has published a useful, but slightly biased, comparison (biased because the author had much experience with handles and no prior experience with PURLs).