This page holds notes on the current Fedora configuration, as well as misc information that must be understood when configuring Fedora.
In order to run a large repository (more than 100k Fedora Objects) while using the ResourceIndex, you must have a 64-bit version of Java running on a 64-bit OS.
Our development machine is rhyme.dlib.indiana.edu, aka fedora-dev.dlib.indiana.edu. Its core stats are: Dual CPU 3GHz each, 6G RAM, 420G usable disk space.
Since Fedora isn't very tolerant of losing its database connection, there is a cron job to stop it before the database is shutdown for backups, and another cron job to start it afterwards.
To start Fedora on Rhyme
- Login to your rhyme account
- Login to the Fedora account:
su - fedora
- Type in password (same as the Fedora administrator password)
- Ensure that Fedora is not running:
- Start Fedora:
- Log out
Our production machine is thalia.dlib.indiana.edu, aka fedora.dlib.indiana.edu. It has 16G of RAM and four dual core processors.
Connects to Oracle database on ora-iudlu (erato).
The fedora.sh script has been modified to increase the available Java heap space (-Xmx768m)Xmx4096m). Likewise, the JAVA_OPTS variable has been set to increase the Java heap for the "helper" Tomcat.
The Fedora "migration guides" typically tell you to back up your data before upgrading. While this is a useful practice, it isn't always practical.
Windows note: A similar process can be followed on Windows, but (since Windows doesn't support symlinks) it is easier to specify a FEDORA_DEV directory, and copy the various versions of the code into there.
Fedora 2.1 setup
We are using the default security setting ssl-authenticate-apim. This gives us basic SSL encryption for administrative tasks, but leaves the server open (no authentication) for basic access tasks. For this setting (and the parallel non-SSL setting) the doMediateDatastreams parameter must be set to false. For SSL to work correctly, the fedoraRedirectPort must be open on the machine.
- If you want to connect through SSL, make sure you use the https protocol and the redirect port (usually 9443).
- If you're starting from a blank repository and ingesting items from elsewhere, you must first ingest:
- All "util:*" objects
Fedora 2.0 setup
Fedora 2.0 is much easier to set up than 2.1. The official installation instructions should be adequate.
However, you MUST INSTALL the patch available at http://scripta.lib.virginia.edu/bugs/show_bug.cgi?id=83Image Removed (the attacments are near the top of the page, and they download with a CGI extension that must be changed to the correct filetype)
Current test setup (on mallow)
Must fedora-convert-demos to put correct hostname in demo objects.
fedora-stop mckoi-stop username password
The log files output by fedora only include useful information when they are set to the "finest" level, but this level creates incredibly large logs.
We will currently treat all Fedora-generated logs as disposable, being only useful for debugging. When we want to track "real" use, we will have to route everything through the Apache/Tomcat connnector. Fedora 2.2 should include more organized logging output, and we may switch to that system when it is available.
Ingest speedup – reduce garbage collection
From the Fedora mailing list comes this suggestion...
On Thu, 2006-01-26 at 16:07, Edwin Shin wrote:
> I don't recall the rationale for the gc on each commit, but it precedes
> 2.1b. If you're interested you could try settting the system property
> "fedora.GCOnCommit" to "false" (add it to fedora.sh).
Thanks a lot for that hint. We tried changing the setting and our speed
jumped from 3.1 obj/sec to 10.4 obj/sec (average over 5000 ingested
If this is a safe setting, then we'll keep it. We're using a dedicated
maschine for Fedora, so we don't need to minimize the amount of RAM used
by the JVM for Fedora.
Should anyone else want to try this, insert the line
-Dfedora.GCOnCommit=false \ in the exec-call under "# start Tomcat" in fedora.sh.
Fedora runs on its own (modified?) instance of Tomcat. It is currently not advisable to run anything besides Fedora on this version of Tomcat, because it has been tune to give some performance enhancements for Fedora use. Be very carful when selecting ports so they don't conflict with another Tomcat that may be running on the same machine. If you change the port on which Fedora runs, it will automatically reconfigure the Fedora Tomcat, since this is really the service that's running on that port. Certain types of changes to the Tomcat config are overwritten by Fedora, so it is unlikely that we could use this copy of Tomcat for anything else.
When making a change to an XSL file, there is no simple way to reset the cache, unless the behavior mechanism explicitly uses the clear-stylesheet-cache option. The only thing you can do is restart Fedora (which restarts Tomcat).
Bugs can be reported to Fedora's Bugzilla
user: fedora-bugreport at comm.nsdl.org
The XML records that represent Fedora objects are stored in Fedora's objects directory (fedora2_0_objects by default). Underneath this directory, they are organized by a crazy date/time directory structure. Even though they don't have an XML extension, the files are really XML.
If we want to convert from Managed to External content, we can just purge and re-create the datastreams. Of course, this would lose any version information.
The Fedora project has done some performance testing on a repository with 1 million objects.
How many items can share a PID prefix? A PID is a 64-digit string, so if we use the prefix "iudl:", we have plenty of options for numerical data. We could even add a collection code, like "iudl:hohenberger-1214".
While it is diffucult to determine exactly what the real use will be, we have tested the Fedora-based Slocum Puzzles webapp with 50 simultaneous users making continuous requests. The server slowed down, but was still giving response pages within a reasonable amount of time (<5 seconds). The bottleneck seemed to be the speed at which the purlResolver app could serve images out of Fedora. With improvements to this system (possibly copying the thumbnails to a static location), we should be able to increase the performance.
The Kowari-based resource index requires over 54 GB of virtual memory.
Purging a repository
(from Nikolai Schwertner, via the Fedora mailing list)
The best way to purge all objects from a Fedora repository is to reset the
repository. Here are the steps:
- Stop the Fedora instance.
- Drop the Fedora database, and create a blank Fedora database with the same permission/privileges OR empty the tables using an external SQL tool
- Delete the files and subdirectories from the Fedora objects, datastreams, temp, and resourceIndex directories.
- Start the Fedora server.
We will likely want to run more than one repository, at least one for cataloging/testing use and one for production use. Will thinks it may be useful to keep one centralized repository for the master metadata and periodically export that data to one or more production repositories.
If we do split up the repositories like this, will we want to also have duplicate copies of the media files? Or should all media files be stored outside the repositories, on a separately managed filesystem (or set of filesystems)?
Moving data between repositories can be an issue if relationships are present.