Child pages
  • Fedora Performance Issues
Skip to end of metadata
Go to start of metadata

Disseminators and Datastreams

Using Fedora 2.1.1, there is a significant performance difference between retrieving the contents of a datastream directly vs. accessing the datastream through a disseminator.

For example, when I retrieve a thumbnail image (40kb) directly via a REST call to the THUMBNAIL datastream, it takes around 20ms. When I access the same datastream through a getThumbnail disseminator (again using REST), the time is around 500ms. It also seems that when I have many concurrent calls to the disseminator, Fedora forces the calls to be sequential, rather than handling them in parallel threads.

Until this problem is resolved, accesses that need high performance should be directed to the datastream, rather than a disseminator.

Ingest speedup – reduce garbage collection

From the Fedora mailing list comes this suggestion...

On Thu, 2006-01-26 at 16:07, Edwin Shin wrote:
> I don't recall the rationale for the gc on each commit, but it precedes
> 2.1b. If you're interested you could try settting the system property
> "fedora.GCOnCommit" to "false" (add it to fedora.sh).

Thanks a lot for that hint. We tried changing the setting and our speed
jumped from 3.1 obj/sec to 10.4 obj/sec (average over 5000 ingested
objects).

If this is a safe setting, then we'll keep it. We're using a dedicated
maschine for Fedora, so we don't need to minimize the amount of RAM used
by the JVM for Fedora.

Should anyone else want to try this, insert the line
-Dfedora.GCOnCommit=false \
in the exec-call under "# start Tomcat" in fedora.sh.

  • No labels