RLG "Trusted Digital Repository" Checklist

The most important document related to preservation is RLG's Audit Checklist for the Certification of Trusted Digital Repositories.

Items on RLG's checklist of immediate interest. For the most part, these items relate to our use of HPSS:

Possible threats to the preservation repository

Other preservation-related items

OAIS is the primary model for building preservation systems.

RLG's older document, Trusted digital repositories: Attributes and Responsibilities.

The October 2005 RLG DigiNews describes requirements for "certification" of a digital repository. Many of these requirements are related to preservation.

There is a paper from Los Alamos about integrating OAI and OpenURL with an OAIS model.

There is a paper on bottom-up preservation issues that may also be useful.

The architecture of the LOCKSS system may be a useful guide, but LOCKSS currently relys on each participating institution "owning" a copy of the item. This is most useful for electronic journals.

Portico is a preservation archive for e-journals, based on Documentum. They perform extensive validation before accepting data into their repository.

The Florida Center for Library Automation is developing an archive system. This may provide some ideas, and they may eventually release code that we can use.

Vol. 54 (2005) no. 1 of 'Library Trends' is a theme issue, entitled: "Digital Preservation: Finding Balance", with many interesting articles.

Xena is a program that can take many file formats and convert them to a (supposedly open-format, long-term stable) XML format. It has been used with DSpace.

In the IU computer science department, Thomas Reichherzer and Geoffrey Brown are working on tools to aid "preservation via emulation".

A paper on preservation issues from the LOCKSS group. And the related Saving Bits Forever presentation.

Another paper on preservation issues.

Building an integrity checker

If all goes well, an integrity checker will never report a problem. But a non-existent or non-running integrity checker would exhibit the same behavior. So we must set up tests that ensure the integrity checker is regularly exercised. One simple way would be to put an object in the repository and purposefully corrupt it, then measure how long it takes for the integrity checker to notice.

Open questions

  1. Can we eventually store lots of small files (page images) directly in HPSS via the filesystem interface without aggregation, or will it simply take too long to retrieve these files?
  2. Should we store copies of our metadata in HPSS, or are the regular server backups good enough for this?