Disseminators and Datastreams
Using Fedora 2.1.1, there is a significant performance difference between retrieving the contents of a datastream directly vs. accessing the datastream through a disseminator.
For example, when I retrieve a thumbnail image (40kb) directly via a REST call to the THUMBNAIL datastream, it takes around 20ms. When I access the same datastream through a getThumbnail disseminator (again using REST), the time is around 500ms. It also seems that when I have many concurrent calls to the disseminator, Fedora forces the calls to be sequential, rather than handling them in parallel threads.
Until this problem is resolved, accesses that need high performance should be directed to the datastream, rather than a disseminator.
Ingest speedup – reduce garbage collection
From the Fedora mailing list comes this suggestion...
On Thu, 2006-01-26 at 16:07, Edwin Shin wrote:
> I don't recall the rationale for the gc on each commit, but it precedes
> 2.1b. If you're interested you could try settting the system property
> "fedora.GCOnCommit" to "false" (add it to fedora.sh).
Thanks a lot for that hint. We tried changing the setting and our speed
jumped from 3.1 obj/sec to 10.4 obj/sec (average over 5000 ingested
If this is a safe setting, then we'll keep it. We're using a dedicated
maschine for Fedora, so we don't need to minimize the amount of RAM used
by the JVM for Fedora.
Should anyone else want to try this, insert the line
in the exec-call under "# start Tomcat" in fedora.sh.