Child pages
  • IngestTool Worker Process
Skip to end of metadata
Go to start of metadata

Overview:

The IngestTool class just kickstarts the worker in an appropriate context (background process or same thread).

The IngestToolWorker does the dirty work. This diry work was formally defined in the flow.vsd doc, but is going to be cleaned and formallized here. In otherwords, this is the definitive resource for this documentation.

This package contains the following classes:

  • IngestToolWorker – this is the main class
  • IngestToolLogger – a logger
  • IngestToolDataLoader – this loads data from file/url locations
  • IngestToolDataGenerator – this generates data with xslt, jhove, etc
  • IngestToolDataAnalyzer – this analyzes various datasets
  • IngestToolDataSearcher – this searches md for data items
  • IngestDB – this interfaces to the ingest database
  • Enums:
    • IngestEvent – enumerates the Ingest Events from the Ingest DB
    • IngestState – enumerates the Ingest States from the Ingest DB
    • IngestItemState – enumerates the Ingest Item States from the Ingest DB
  • Types:
    • ID2FilenamesHash – maps to Hashtable<String, ArrayList<String>>
  • Exceptions:
    • SkippedItemException – thrown to skip an item.

The main application flow is as follows:
1) Load Control Metadata
2) Load Completion Logs (DB)
3) Load Metadata
4) Generate FileLists
5) Generate Workflow
6) Analyze Metadata
7) Analyze Content
8) Process Content
9) Process Collection

Processing details:

1) Load Control Metadata

2) Load Completion Logs (DB)

3) Load Metadata

  • 3.1) Load Descriptive
  • 3.2) Load Technical
  • 3.3) Load Structural

4) Generate FileLists

  • 4.1) Load Master FileList
  • 4.2) Load Derivative FileList
  • 4.3) Load Overview FileList

5) Generate Workflow

  • 5.1) Generate ID List from MD and FileLists

6) Analyze Metadata

  • 6.1) Analyze Descriptive
  • 6.2) Analyze Technical
  • 6.3) Analyze Structural

7) Analyze Content

  • 7.1) Analyze Master FileList
  • 7.2) Analyze Derivative FileList
  • 7.3) Analyze Overview FileList
  • 7.4) Validate Proxy Objects

8) Process Content

  • 8.1) Process Image
    • 8.1.1) Iterate over items
    • 8.1.2) Check completion logs
    • 8.1.3) load desc md for item
    • 8.1.4) load tech md for item
    • 8.1.5) check if item exists
    • 8.1.6) build default dc
    • 8.1.7) generate derivatives
    • 8.1.8) build foxml
    • 8.1.9) build mets
    • 8.1.10) upload derivatives
    • 8.1.11) create disseminators
    • 8.1.12) ingest item
    • 8.1.13) build relationships
    • 8.1.14) trigger search reindex for item
    • 8.1.15) move master to hpss
    • 8.1.16) remove derivatives
    • 8.1.17) log item in completion logs
  • 8.2) Process Paged Content
    • 8.2.1) iterate over books
    • 8.2.2) check completion logs
    • 8.2.3) load struct md for book
    • 8.2.4) check if book exists
    • 8.2.5) build default dc
    • 8.2.6) generate overview
    • 8.2.7) build foxml
    • 8.2.8) build mets
    • 8.2.9) upload overview
    • 8.2.10) create disseminators
    • 8.2.11) ingest book
    • 8.2.12) build relationships
    • 8.2.13) process child items
    • 8.2.15) trigger search reindex for book
    • 8.2.16) remove overview
    • 8.2.17) log book in completion logs

9) Process Collection

  • 9.1) load collection
  • 9.2) upload configuration
  • 9.3) upload collectionn level metadata
    • 9.3.1) descriptive
      • 9.3.1.1) EAD
      • 9.3.1.2) MODS
    • 9.3.2) technical
    • 9.3.3) structural
  • 9.4) ingest collection
  • 9.5) update completion logs
  • No labels