Child pages
  • Coordinating Fedora Instances
Skip to end of metadata
Go to start of metadata

You are viewing an old version of this page. View the current version.

Compare with Current View Page History

« Previous Version 7 Next »

We want to run more than one repository, at least one for cataloging/testing use and one for production use. Will Cowan thinks it may be useful to keep one centralized repository for the master metadata and periodically export that data to one or more production repositories.

Fedora will eventually have built-in support for federated repositories.

Tentative plan:

There are two Fedora repositories, "development" and "production". The production repository always contains the master copy of the data. The contents of the development repository are never considered authoritative; they are only there for testing purposes. Specialized projects, like Evia, may have their own development repository. Individual developers may maintain their own development repository.

All features are deployed on the development repository for testing. If the development repository does not contain relevant data to test the feature, data may be copied from the production repository, or new test objects may be created.

All data is ingested to the "development" repository while a collection is first being developed.

As each collection is finished, its contents will be exported, and imported to the production repository.

  • Periodically, an incremental update will be run to sync the development content with the production content

Open questions:

  1. Do we need 3 Fedora instances: development, cataloging (master), production??
    • Issues related to data stored on the instances:
      • There won't be much "in progress" data, and it will primarily be for collections that haven't been published yet.
      • Is there a need to separate cataloging from production? We don't want "in progress" items ending up on user's screens, but the catalogers need to view their items in the context of a real system.
    • Issues related to features:
      • The development server may go down more than the production server, as new features are tested.
      • We want to make sure new features for one collection don't break anything for other collections. New features should always be tested on the dev Fedora before moving to others.
      • Catalogers need a stable platform to work on.
      • Digitizers/submitters (people using ImageProc or requesting upload of data) need a stable platform to work on.
      • As we develop the Cataloging Tool and have a broader user base, people will expect their data to show up in the production system immediately, so the cataloging system should be the production system. (But we need a way to determine when something has enough data to be visible...)
  2. Are we preserving the data in the development repository or the production repository?
    • Variations treats the cataloging server as the master copy of the data, although some of the data in that server may be "in progress".
    1. What kind of load will the preservation system put on our server? Is this load better suited to the dev server or the production server?
    2. Does the dev server have enough space to store everything?
  3. Is it possible to "update" objects on the production server and maintain version history for these updates only?
  4. Is it better to do incremental updates per collection? Or over the entire repository? Does it hurt to have "junk" data in the production repository?
  5. Will large ingests have an impact on the performance of the production Fedora?
  6. Do we need to separate statistics for our preservation integrity service from regular use statistics? If so, how? By IP?
  • No labels