Next steps for moving towards preservation

  1. Start documenting policy decisions. This page, its siblings, and its children are good starting points.
    1. As we make decisions, determine what documents we should sort the decisions into, and what person/group should manage each document. Many documents will be managed by the Repository Manager on a daily basis, but regularly reviewed by the Repository Preservation Board.
  2. Look over documentation from other preservation repositories. Possibilities include:
  3. Determine who should be invited to join a Repository Preservation Board. This group needs broad representation across the libraries and the major users of the repository, but needs to be small enough to make decisions effectively. This group should meet regularly to ensure that policies meet the needs of the repository's users, and ensure that policies are being followed.
  4. As collections are added to the repository, validate whether they meet the current policies. Provide some central way to track this...
  5. Move towards a better Fedora/HPSS connection, so we can take advantage of preservation tools that are developed by the Fedora community.
  6. Set up an initial meeting of a Repository Preservation Board.

RLG "Trusted Digital Repository" checklist

A new version of this checklist, called the TRAC, Trusted Repository Audit and Certification, has been released.

The most important document related to preservation is RLG's Audit Checklist for the Certification of Trusted Digital Repositories. While this checklist is a good place to start working on our preservation system, we won't treat it as a mandate.

Melanie Schlosser has started a workspace to organized the items on the checklist and document how the current DLP activities relate to them.

Items on RLG's checklist of immediate interest. For the most part, these items relate to our use of HPSS:

Possible threats to the preservation repository

Other preservation-related items

List of file formats and their expected longevity from the Library of Congress.

A paper on using Fedora as a preservation system

OAIS is the primary model for building preservation systems.

RLG's older document, Trusted digital repositories: Attributes and Responsibilities.

The October 2005 RLG DigiNews describes requirements for "certification" of a digital repository. Many of these requirements are related to preservation.

There is a paper from Los Alamos about integrating OAI and OpenURL with an OAIS model.

There is a paper on bottom-up preservation issues that may also be useful.

The architecture of the LOCKSS system may be a useful guide, but LOCKSS currently relys on each participating institution "owning" a copy of the item. This is most useful for electronic journals.

Portico is a preservation archive for e-journals, based on Documentum. They perform extensive validation before accepting data into their repository.

The Florida Center for Library Automation is developing an archive system. This may provide some ideas, and they may eventually release code that we can use.

Vol. 54 (2005) no. 1 of 'Library Trends' is a theme issue, entitled: "Digital Preservation: Finding Balance", with many interesting articles.

Xena is a program that can take many file formats and convert them to a (supposedly open-format, long-term stable) XML format. It has been used with DSpace.

In the IU computer science department, Thomas Reichherzer and Geoffrey Brown are working on tools to aid "preservation via emulation".

A paper on preservation issues from the LOCKSS group. And the related Saving Bits Forever presentation.

Another paper on preservation issues.

Building an integrity checker

If all goes well, an integrity checker will never report a problem. But a non-existent or non-running integrity checker would exhibit the same behavior. So we must set up tests that ensure the integrity checker is regularly exercised. One simple way would be to put an object in the repository and purposefully corrupt it, then measure how long it takes for the integrity checker to notice.

It may be useful to have two levels of integrity checking:

Other thoughts

Open questions

  1. Can we eventually store lots of small files (page images) directly in HPSS via the filesystem interface without aggregation, or will it simply take too long to retrieve these files?
  2. Should we store copies of our metadata in HPSS, or are the regular server backups good enough for this?
  3. How do we manage preservation packages for materials that will be accessed through Variations? We don't want to store duplicate copies of the derivative files, because they are quite large. Perhaps Fedora can store these as Redirect datastreams, and just keep the appropriate metadata in the actual repository directories.
  4. What is the best way to provide "proof" that something hasn't been altered since it was originally digitized?