Skip to end of metadata
Go to start of metadata

WGBH USE CASES:

1)
AS A: MLA Staff member
I CAN: Ingest technical metadata XML files.
SO THAT: I can store them in a trusted digital repository.

Technical metadata files are created as part of the digital preservation process when files are places onto LTO tape. The most common schema for this XML is FITS (File Information Tool Set) but there is also EXIFTOOL and PBCore. There is a file crested for every file preserved onto LTO tape.
The specific elements and attributes within the XML upon ingest should be parsed out to be Solr-ized and searchable.
Ingest could either be done through the web application or console commands.

2)
AS A: MLA Staff member
I CAN: Access to technical metadata records.
SO THAT: I can access the technical metadata records in the repository.

Staff should be able to search by any of the elements or attributes that have been extracted and stored in Solr.
Example: A staff member doing a search for a filename should get the results of all the records that contain that specific filename, then be able to see one which LTO tape the item of interest is located on and retrieve it in person.

Users should also be able export the found set of original binary XML files.
Example: A user does a search for all the records that belong to a specific LTO barcode then can export all the individual XML files. This will be valuable for integrity checking and future file migration workflows.

3)

AS A: MLA Staff Member

I CAN: Add non-technical metadata to the records that was not present in the technical metadata to be extracted.

SO THAT: I can add useful metadata to the records such as LTO barcode or physical location.

4)

AS A: MLA Staff Member

I CAN: Link web streaming a/v derivative files existing on a server outside of the material ingested into HydraDAM to their corresponding asset records within HydraDAM.

SO THAT: Low resolution derivatives created outside of HydraDAM can be used by the web application.

Example: Our workflow has found that it's much faster and more controllable to have web streaming derivatives and thumbnails created for a/v files outside of HydraDAM, we currently have a way to link to these files based on the main asset's MD5 value.

A video file with an MD5 value of 

6e3ce8f168ea34a1b8e76cf70515773a

has a thumbnail and .mp4 file created for access viewing.  Those files exist on a streaming server in a nested folder structure like:

wgbh_streaming_server.org/6e/3c/e8/f1/68/ea/34/a1/b8/e7/6c/f7/05/15/77/3a/web_movie.mp4

 

 

INDIANA UNIVERSITY USE CASES:

1)

AS A: LIT staff person

I CAN: ingest audiovisual content as SIPs

SO THAT: I can store objects in a preservation repository

Content to be ingested can be broken down into two distinct cases: MDPI objects and born-digital objects. In both cases, content will be deposited as a SIP with a locally developed, standardized structure. Each SIP will contain the master and production/mezzanine files as well as the technical and descriptive metadata in the form of FFPROBE and MODS XML files.  MDPI content will also include POD XML and any extra metadata from Memnon or IU. Ingest should include parsing of at least the technical metadata to ensure the searchability and reporting functionality outlined below. Items deposited in the repository should be stored on IU’s Scholarly Data Archive (SDA).

TBD: Maintaining new technical metadata (e.g., fixity check run dates); future version: ingestion of other types of content (e.g., non-audiovisual)

 

2)

AS A: LIT staff person

I CAN: locate content in the repository individually or as a group

SO THAT: I can effectively manage and preserve content

While the main discoverability of content will be undertaken within IU’s access repository Avalon, HydraDAM 2 should make content findable through a few different means. The main information for discoverability within the repository will be the object’s unique identifier. In the case of all MDPI content this will be the barcode generated for the original physical object. For born-digital content, this will be the VA number or other unique identifier. This should also likely involve a small amount of descriptive metadata (e.g., title, creator) to prevent loss due to error.

 

3)

AS A: Digital Preservation Librarian

I CAN: generate reports about the content

SO THAT: I can effectively understand content characteristics and thereby steward content effectively

HydraDAM 2 should be able to generate reports on all or some content based on the technical metadata included in FFPROBE XML files. Example reports would include file format, rights information, and content owner information. To steward content effectively, I need to be check on that content in a variety of ways. As we discover failed fixity checks, I need to see if there were any Digitization Comments, Original Media Failures and/or Original Media Preservation Problems. As an additional step, I may need to see QC Status and other QC information to determine if the QC process has any effect on the stability of digital files. I would want to discover content by Encoder information so that if we discover problems with a given encoder, I can discover all preserved digital files for that encoder, within a given timeframe, and run fixity checks against those digital files to determine if there is a higher rate of corruption of those files. This would apply to other hardware devices as well. I might want to organize this information by Unit of Origin, who it was Digitized By and Staff ID of the digitizer. 

 

4)

AS A: LIT staff person

I CAN: access needed files to download in order to replace damaged content elsewhere

SO THAT: I can effectively provide continued access to content

If an object is determined to be corrupted anywhere within the repository infrastructure, the master copy within HydraDAM 2 will be utilized to generate a new version to replace the corrupted file. To support this, HydraDAM 2 should include the ability to download content as a whole SIP as well as by file.Transformations will be carried out elsewhere.

 

5)

AS A: Digital Preservation Librarian

I CAN: interact with objects based on fixity information, searching and sorting based on last fixity date and status

SO THAT: I can effectively monitor the fixity of content

Basic preservation activities that HydraDAM 2 should incorporate are currently limited to fixity checking and ingestion. The repository should enable a staff person to run fixity checks upon ingest and at regular intervals on all content. Additionally HydraDAM 2 should enable fixity checking upon an event, such as the discovery of corrupted files in the access system. This should enable a repository manager to create batches of files on which to run fixity checks or to add files to the ingest/interval batches. The repository should provide a notification of all fixity checks performed as well as alert staff to any fixity check failures. The repository should log fixity checks as PREMIS events and should capture the unique ID of the staff person undertaking the fixity check.

 

  • No labels