Page tree

Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

The Audiovisual Metadata Platform, or AMP, is a software platform that aims to generate metadata for digitized and born digital audiovisual materials using a combination of machine learning models and human intervention. It was created in collaboration between Indiana University, AVP, the University of Texas at Austin, and the New York Public Library. It is funded by a grant from the Mellon Foundation.

The creation of mass digitization projects is one of the most important shifts in library and archival practices, both from a preservation and access standpoint. As analog media continue to deteriorate, digitization is the best hope for their long-term storage. However, the sheer amount of materials created by digitization makes it unfeasible for catalogers to catalog everything by hand. Making matters worse, many of these materials come with only scant metadata, meaning a cataloger would need to watch or listen to the item to generate metadata. As such, using machine learning to automate the metadata generation is a promising solution. However, while machine learning algorithms have significantly improved over the years—particularly in the last decade with the sudden explosion of deep neural networks—they still have difficulties and biases. These pose problems for metadata in terms of usefulness, accuracy, and fairness. The goal of AMP is to introduce human intervention in its machine learning pipelines to circumvent some of these limitations, so the metadata it generates is more correct, or at least more useful, to collection managers and catalogers.

This guide explains AMP's front-end features and functionality, and additionally explains how to navigate the user interface. It is designed for users of AMP: collection managers, catalogers, or any other person using the platform. Presently, the guide reflects AMP as it currently exists in its pilot stage; it is very likely that some of this information will change over time.

Definitions

These definitions are provided to make reading this guide easier.

  • Collection: A set of items that are subject to the same access control settings
  • Bundle: A set of items from a collection or multiple collections. The bundle gathers items that the user wants to submit through a workflow at the same time
  • Item: A bibliographic item. It contains metadata and A/V binary content. Items belong to a collection
  • File: A file is a media file (sound recording, moving image) that is part of an item. Multiple related files can exist in one item
  • Primary Files: Binary objects that are provided as primary resources from a collection-holding institution
  • Supplemental File: Any file that is provided to supplement the information about a collection, an item, or a primary file
  • Workflow: A representation of a graph that describes the routing rules for a set of MGMs. The input of a workflow may be an item or a group of items
  • Metadata Generation Mechanism (MGM): A machine learning tool or other tool (e.g., automated non-machine-learning tools like ffmpeg, or manual tools like a transcript editor) provided to users to interact with AMP
  • Job: One execution of a workflow for a particular Primary File
  • Unit: A tenant in a multi-tenant AMP; collections belong to a Unit