This page is finalized. Any modifications should be approved by the infrastructure group.

As we create new collections, it is useful if there is consistency in the filenames assigned to digital objects. Each collection will impose its own restrictions on filenames, but following these requirements will ensure basic consistency across collections and make later processing of the files much easier.

Files must be named according to these requirements to be ingested into the Fedora repository. These requirements must be used for all new DLP collections. Older collections that do not meet the requirements will be renamed when it becomes convenient. Note that these requirements apply to "digital objects" only. Other supporting materials (documentation, html pages, etc.) may be named according to different standards. If supporting materials are to be stored in the repository, they may be collected into a tar or zip file whose name meets these requirements.

The need for standardized filenames

The needs to be fulfilled by enforcing requirements on filenames are (in order of importance):

  1. Ease of identification. During the life of a digital file, it moves through various locations. These moves may be due to manual processing or automatic processing. A file may stay in a given location for an arbitrary length of time, during which human memory of its origin fades. At all points in the life of the file, humans and automatic processes must be able to easily identify which digital object the file belongs to and the position it occupies within that object. As a side effect of this, the filename will facilitate the process of locating metadata about the file.
  2. Automatic processing systems must be able to make basic assumptions about the filenames they will process. Developers should not be concerned with complex processing to handle special characters. (Although as a security matter, it is good practice to verify that a filename conforms to these requirements before initializing an automatic process.)

Requirements for filenames

Absolute requirements

All files must conform to the following requirements:

Best practices

The following "best practices" should be followed whenever possible. If one of these practices is not followed, the change should be well documented, with a description of the reasons for not following the practice.

Exception to the requirements

Files that are stored within a closed system (e.g., Fedora, Variations, DLXS) do not need to follow these requirements if the system's internal processing dictates another scheme. For example, Fedora must manage its own filenames to ensure proper version control. However, it is recommended that systems written by the DLP follow the requirements when possible, and closed systems should provide a method for converting internal filenames to "external" names that meet the requirements.

About NOTIS-style identifiers

The old NOTIS system generated IDs consisting of three letters and four numbers (like VAA1234). To easily connect with IUCAT, we have continued using IDs of this form for IU holdings that do not belong to a special collection. Items that use these IDs always include them as the "identifier" portion of the filename. These identifiers are unique across all DLP holdings. We would hope they are unique across the IU library system, but we have no way of knowing if other groups have adopted conflicting standards that emulate the old NOTIS system.

Sample filenames

Collection

Filename

Notes

IN Harmony

ihs-SHMU_01_13-01-05.tif

ihs stands for Indiana Historical Society, SHMU_01_13 is an identifier local to the Historical Society, 01 indicates the first (physical) copy of the item was used for digitization, and 05 indicates this file is the 5th page of the item.

IN Harmony

ihs-SHMU_01_13-01-05-full.jpg

Same as above, but this is a derivative file at the "full" size.

Variations

aeg9051c.wav

A NOTIS-style identifier. The sequence number is actually an alphabetic character, in this case a "c" indicating that the file is third in a set of files for this item.

Variations

aeg9051c-192k.mov

Same as above, but a derivative file encoded at 192kbps.

Hoagy Carmichael photos

ATM-MC2-3-11-30-p1-screen.jpg

A sample derivative file.

Cushman

P15754.tif

Files in the Cushman collection are all of one type, and they don't consist of pages, so the filename only consists of the local identifier.

Filename standards currently in use