Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

In order to run a large repository (more than 100k Fedora Objects) while using the ResourceIndex, you must have a 64-bit version of Java running on a 64-bit OS.

Rhyme setup

We are going to initially use rhyme.dlib.indiana.edu. Its core stats are: Dual CPU 3GHz each, 6G RAM, 420G usable disk space.

Connects to Oracle database on ora-iudlu-dev (urania). When we move Fedora to production, we will need to convert to a production database.

...

If we want to convert from Managed to External content, we can just purge and re-create the datastreams. Of course, this would lose any version information.

Space issues

We are going to initially use rhyme.dlib.indiana.edu. Its core stats are: Dual CPU 3GHz each, 6G RAM, 420G usable disk space.

Current stats for other collections:

  • Hohenberger: 2143 images, each with master, thumbnail, screen, and large JPG. Masters take 13GB. Derivatives take 1.4GB. Full Fedora storage (derivatives, metadata, and resource index) takes 2.5GB.
  • DIDO: 40,000 images, each with master, thumbnail, screen, and large images. Derivatives take 15GB.
  • US Steel: 2200 images with master, thumbnail, and screen. Masters take 20GB.
  • Cushman: 14,500 images. Derivatives (including notebooks) 3.2GB.
  • Victorian Women Writers: (text only)
  • Wright American Fiction: 400,000 pages. Derivatives are generated on the fly. Master images and text files are 111GB.
  • Sheet Music (DeVincent): 50,000 pages. Screen and full/large for all pages; thumbnails for some. Derivatives 11GB.
  • Sheet Music (Starr): Derivatives 600MB.
  • Jane Johnson: Derivatives 300MB.
  • Letopis
  • Hoagy
  • FLI
  • Eviada
  • Newton
  • IN Harmony
  • Camva
  • IU Archives: 600 images. Does not have large size yet. Derivatives 113MB.

Rough guidline: If we create three derivative sizes for each image, 1GB per thousand images/pages.

An incredibly rough estimate: 100GB to store derivatives for all current collections.

Note that if we move our derivatives to JPEG2000, storage needs may change slightly, but not by very much.

System limits

The Fedora project has done some performance testing on a repository with 1 million objects.

...

How many items can share a PID prefix? A PID is a 64-digit string, so if we use the prefix "iudl:", we have plenty of options for numerical data. We could even add a collection code, like "iudl:hohenberger-1214".

...