Hardware requirements

Current service arrangement

Thalia

Rhyme:

Feta:

Erato:

Brie:

Urania:

Euterpe:

Clio:

Gigue:

Gavotte:

Algernon:

Melpomene:

Not currently provided:

Proposed future arrangement

This arrangment re-organizes the current layout a bit, centralizing functionality onto fewer machines. We would have to move to this arrangement gradually.

Thalia, production Fedora:

Rhyme, development Fedora:

Feta, production webapps:

Brie, development webapp/database:

Bleu, off-line processing:

????, database:

Algernon:

Gigue

Melpomene, Clio, and Euterpe will disappear. Euterpe's file serving capabilities may move to LIT.

The most pressing needs are to purchase replacements for Gavotte, Erato, and Urania. We need to develop short-term backup solution, while we wait for the UITS system-wide backup to be implemented.

Storage needs (as of 2006)

Rough estimates of throughput (master files only):

Selected stats for current collections:

Rough guidline: If we create three derivative sizes for each image, 1GB per thousand images/pages.

An incredibly rough estimate: 100GB to store derivatives for all currently deployed collections (excluding audio/video), 1.5TB if we store all the masters as well. But this could be under-estimating by a bit, since we have many collections in the pipeline. Including the files in euterpe\digitize, and assuming roughly a 10:1 master to derivative size, we may have more like 200GB for derivatives and 4TB total.

We should definitely err on the high side when estimating storage needs. For image collections delivered through DLXS (like Wright American Fiction), the master files must be stored locally, and derivatives are generated on the fly.

Note that if we move our derivatives to JPEG2000, storage needs may change slightly, but not by very much.

We also need a "drop box" for bulk content acquisitions, like the 500GB of content we're getting for Newton.

Cost estimates (as of 2006)

We can get 5TB of disk in an Apple XServ Raid for $20k.

A powerful 4-processor (dual core) machine from Sun with 32GB of RAM is available for $50k.

We could completely replace the current tape system with a newer, bigger one for $54k.

Virtualization

The UIS group is moving toward a virtualized system, where they have a big pool of server power that can be dynamically allocated. We may consider using it for some of our needs instead of purchasing new machines for everything.

Schedule:

Open questions

  1. Are we relying on HPSS for storage of masters, or is it feasible to do something else ourselves?
    1. We can probably handle the raw cost outlay of storing them ourselves. It's the extra workload that would be a problem. But keeping HPSS introduces its own workload.
    2. Keeping local copies of the masters would allow us to salvage lost/broken items, making our preservation system more robust. The preservation system could periodically download a single aggregate from HPSS, and compare all of its contents to the local copies, alterting someone if there are any differences.
  2. Do we need specific replacements for Algernon and Gigue? Or can their functions be moved onto other machines?