After an item has been successfully transferred to the Scandium server, additional micro-services need to be run to document the authenticity and integrity of the content, identify potential risks, ad extract technical metadata from files. The information collected during the Analysis process will help collecting units appraise content (i.e., decide if they will keep it) and also assist in the long-term preservation of the content by the IU Libraries.
Key steps in the Analysis process include:
- Virus scan: clamscan.exe
- Sensitive data scan: bulk_extractor
- Format identification: Siegfried
- Documentation of file directory structure: tree
- Checksum creation: fiwalk or Python hashlib module (depending on use case)
Relevant log files, reports, and PREMIS preservation metadata for the above operations are stored in the 'metadata' folder within the main item barcode folder (see Scandium Overview for more information on the directory structure).
There is no difference in the Analysis workflow among the different job types. Once the 'Transfer' operation has completed, simply click the 'Analyze' button:
The CMD.EXE window will provide updates on the Analysis process, indicating when an operation has completed or if any errors have occurred. Precise steps will depend upon the job type (see overview), but will include a virus scan, checksum calculation, scan for sensitive information, file format identification, and generation of various reports.
At the end of content analysis, the Ingest Tool will provide a summary report of the barcode item's contents, including a count of disk images and replicated files and folders as well as the size of the replicated content:
At this point, you may work with a new barcode item or quit.
The Analysis process may crash unexpectedly due to a bug in the Ingest Tool code or an unexpected feature in the content. Be sure to take a screenshot of the displayed error message and send to the BDPL Manager via Slack. In many cases, it may be possible to simply rerun the Analysis process after the error has been addressed, though in some cases, the BDPL manager may have to perform a manual intervention.
See section 8. Troubleshooting for more information on how to address errors that arise during Analysis.