This page holds notes on the current Fedora configuration, as well as misc information that must be understood when
Current test setup
Must fedora-convert-demos to put correct hostname in demo objects.
Current McKoi username & password: fedora
Running on port 8080
For Fedora 2.0, MUST INSTALL the patch available at http://scripta.lib.virginia.edu/bugs/show_bug.cgi?id=83 (the attacments are near the top of the page, and they download with a CGI extension that must be changed to the correct filetype)
Demo objext XML is in My Documents\fedora-2.0-src\dist\client\demo\foxml (there is a parallel directory for the METS versions, but it's unlikely that we will use these)
Fedora runs on its own (modified?) instance of Tomcat. It is currently not advisable to run anything besides Fedora on this version of Tomcat, because it has been tune to give some performance enhancements for Fedora use. Be very carful when selecting ports so they don't conflict with another Tomcat that may be running on the same machine. If you change the port on which Fedora runs, it will automatically reconfigure the Fedora Tomcat, since this is really the service that's running on that port. Certain types of changes to the Tomcat config are overwritten by Fedora, so it is unlikely that we could use this copy of Tomcat for anything else.
When ingesting objects, use the admin password, not the database password.
The fedora server must be restarted for any configuration changes to take effect.
The documentation makes it seem fairly easy to move data from one repository to another: just tell the new Fedora instance to ingest all of the data from the old instance. No idea how long this would take, though.
Object records must be in XML form (METS or FOXML) to be ingested.
In the sample web interface, "View the Item Index" means "View the datastreams".
Fedora provides a lot of undocumented services. See the <fedora-home>/server/tomcat/webapps/fedora/WEB-INF/web.xml file for a full listing. The more interesting ones are:
- report: information on objects that were recently created/modified
- risearch: search the resource index (Kowari)
- getObjectHistory/<pid>: list timestamps of changes to the object
More documentation of API-A-LITE can be found at fedora-2.0-src/dist/userdocs/client/browser/webexp.html
Once created, behavior definitions cannot be changed. Behavior mechanisms can only be changed marginally.
Bugs can be reported to Fedora's Bugzilla
user: fedora-bugreport at comm.nsdl.org
OAI export works automatically.
For example, see:
- A list of items
- [OAI-DC record for one item| http://mallow.dlib.indiana.edu:8080/fedora/oai?verb=GetRecord&identifier=oai:indiana.edu:iuDlp:275&metadataPrefix=oai_dc]
The XML records that represent Fedora objects are stored in Fedora's objects directory (fedora2_0_objects by default). Underneath this directory, they are organized by a crazy date/time directory structure. Even though they don't have an XML extension, the files are really XML.
Objects that are loaded as "Fedora managed" content have their datastreams stored in the datastreams directory (fedora2_0_datastreams) using the same crazy directory structure. The file content is unchanged from the file that was loaded, but the filename is changed to reflect the PID and datastream ID.
The database coordinates all of these objects and datastreams using a fairly straightforward table setup.
If we want to convert from Managed to External content, we can just purge and re-create the datastreams. Of course, this would lose any version information.
Contrary to what the help system says, you can use DATASTREAM references in a mechanism even though you have defined the mechanism as being "Multi-Server Service".
For a mechanism that simply returns a datastream, put the stream name in parentheses (SCREEN), and reference it as a DATASTREAM parameter passed by URL-REF. For a mechanism that performs some operation, enter the URL of the service, adding any data references in parentheses. If the datastream is simply a short piece of text, you may be able to pass it as a VALUE, but typically you will want to pass it as a URL-REF, in which case the service will actually do the work of retreiving the object.
When creating a mechanism, you can pass three types of parameters to the target web service:
- DATASTREAM values are refer to datastreams of the object(s) the mechanism will be bound to.
- DEFAULT values are defined within the mechanism. They are always passed by VALUE, since the
data you want entered in the URL is the data you entered. (You could have pasted this information
directly into the URL, but that would make it much more difficult to read.)
- USER values must be passed from elsewhere, as part of the URL that calls the dissemination. For
an example, see the URL that is created when you use one of the demo image manipulation methods.
We are going to initially use rhyme.dlib.indiana.edu. Its core stats are: Dual CPU 3GHz each, 6G RAM, 420G usable disk space.
Current stats for other collections:
- Hohenberger: 2143 images, each with master, thumbnail, and screen JPG. Masters take 13G.
- DIDO: 40,000 images, each with master, thumbnail, and screen images. Current storage for all takes about 22G.
- US Steel: 2200 images with master, thumbnail, and screen. Masters take 20G.
The Fedora project has done some performance testing on a repository with 1 million objects.
Other system limits
How many items can share a PID prefix? A PID is a 64-digit string, so if we use the prefix "iudlp:", we have plenty of options for numerical data. We could even add a collection code, like "iudlp:hohenberger-1214".
We will likely want to run more than one repository, at least one for cataloging/testing use and one for production use. Will thinks it may be useful to keep one centralized repository for the master metadata and periodically export that data to one or more production repositories.
If we do split up the repositories like this, will we want to also have duplicate copies of the media files? Or should all media files be stored outside the repositories, on a separately managed filesystem (or set of filesystems)?
Moving data between repositories can be an issue if relationships are present.