Fedora Federation/Projection

Configuration Properties

Properties supported by Fedora 4

Properties supported by Modeshape, but not in Fedora 4

Fedora Federation/Projection Practice

Setup Fedora File System Connector


1.  pull down the latest version of the source code (https://github.com/fcrepo4/fcrepo4)
2.  update fcrepo-configs/src/main/resources/config/minimal-default/repository.json to look like the following:

"externalSources" : {
    "filesystem" : {
        "classname" : "org.modeshape.connector.filesystem.FileSystemConnect
        "directoryPath" : "/mnt/working_data",
        "projections" : [ "default:/federated => /" ],
               "contentBasedSha1" : "flase"
        "readOnly" : "true",
        "extraPropertiesStorage" : "json",
        "cacheTtlSeconds" : 5
    }
}


3.  build everything (mvn clean install -DskipTests -Dcheckstyle.skip=true)
4.  in fcrepo-webapp, run "mvn clean jetty:run" or move your war file to tomcat container

Experiment Server configuration

Federated file system structure

There are 23 top-level directories in Fedora 4 repository. Among them, there is one federated node which has 24 top children as following:

  1. big_files: has 42 big files (.mov or .mp4) and 3 sub-directories, each directory has 10 to 20 image files (total size of the big_files directory is about 900+G)
  2. groups_of_1000: has 1000 sub-directories, each directory has 1000 small files (total size of the groups_of_1000 directory is 198G)
  3. million_files: has 1 million files , no sub-directory, size is 198G
  4. temp: 2 big files and 1 sub-directory
  5. Smallfile-1 to smallfile_20: each has 10 to 100 sub-directories, each directory has 100 small files.

Experiments on federated file system

  1.  Before I ingested anything into fedora reository, I tested ‘GET’ http://birch.dlib.indiana.edu:8080/fedora4f/rest/federated/groups_of_1000, it is around 11s to 16s. But it is the baseline.
  2. Then I ingested some xml files and image files into the repository. I also ingested the whole directory of groups_of_1000 into the repository, named binary1, so in http://birch.dlib.indiana.edu:8080/fedora4/rest/binary1, it has 1000 sub-folder, each folder has 1000 small files, just like groups_of_1000. Now the top level of the repository has 23 nodes (1 is the federated node, other 22 are internal nodes), and total has over 1 million files ingested into the repository.
  3. Testing the ‘GET’ time for groups_of_1000 after the ingest done, the ‘GET’ time is from 15s to 23s. ‘GET’ the same structure internal node ‘binary1’ only take 0.12s to 3s.

Results:

 time to 'GET' federated directorytime to 'GET' same structure internal directory
when fedora repository is empty11s to16s--
when fedora repository has ingested 1 million files15s to 23s0.12s-3s
when fedora repository has ingested 1.5 million files15s to 31s0.5s to 15s

Another Option – HPSS connector

GPFS

We can use GPFS to map HPSS to mountable file system. But GPFS needs kernel buildup, that means if system upgrade, we have to rebuild it. Also it is uncertain for GPFS performance.

 

HPSS connector

if  write our own HPSS connector, we need more investigations on the following:

Those are open questions need more time to investigate.

Some useful links