Page tree

Batch Ingest Package Format (DRAFT 1-10-2014)

This documentation is for Release 3.0.0. For the Release 1 version of this page, see v.43 found under Page History. For the Release 2 version of this page, see v.71 found under Page History.

Introduction

Avalon's batch ingest feature provides a method of building one or more media items at a time from uploaded content and metadata outside the user interface. A batch ingest is started by uploading an ingest package consisting of one   manifest file and zero or more   content files to the Avalon dropbox for a specific collection. For your   convenience   there is a demo ingest package   available to download and import into test systems. Follow the instructions below to ensure a successful batch upload.

Ingest Packages

An ingest package is the combination of content and metadata that make up a single batch.

Package Layout

The package (manifest file and associated content files) must be uploaded within   the appropriate collection subdirectory of the Avalon dropbox. The package can either be at the root of the collection directory, or in any subdirectory thereof.  All items included in a single ingest package will be uploaded to the same collection. The following is a very simple package that has been uploaded:

Description: bl-uits-lala:Users:jlhardes:Desktop:Screen Shot 2014-01-09 at 9.18.30 PM.png

Description: bl-uits-lala:Users:jlhardes:Desktop:Screen Shot 2014-01-09 at 9.19.49 PM.png

Manifest File Format

The manifest file is a spreadsheet ( xls , xlsx , csv , or ods ) containing the metadata for the items to be created, as well as the names of the content files that make up each item. In this case,   the manifest file is named   batch_manifest.xlsx .   See batch_manifest_template_R3.xlsx for an Excel example file. Required fields are in bold.

 

A

B

C

D

E

F

G

1

Michael's First Test Batch

michael.klein@northwestern.edu

 

 

 

 

 

2

Main Title

Creator

Date Issued

File

Label

File

Label

3

Test item 1

Klein, Michael B.

2012

content/file_1.mp3

Part 1

content/file_2.mp4

Part 2

4

Test item 2

Northwestern

1951

content/file_3.mp4

 

 

 

Row 1, Column A contains a reference name for the batch. This is mostly for your reference so we recommend naming the batch file according to what will help you remember the contents.

Row 1, Column B contains the submitter's email address (to be used for notifications and exceptions).   The submitter's email must be listed as a manager, editor, or depositor for the collection in which this batch is deposited in the Avalon dropbox .

Row 2 specifies the names of the metadata fields supplied in the following rows.   Main Title ,   Creator , Date Issued, and File are required. These fields are shown in bold in the Excel example file. Each subsequent row represents a single media item to be created. Metadata values are specified first, followed by a list of content files to be attached to each item.

Content files listed in the manifest file must have the correct path noted for where those files are located in the Avalon collection dropbox, relative to the manifest file. Additionally, all content files must include a file extension. If necessary, include any directories or subdirectories (note the paths listed in columns D and F in the above example).

Multivalued fields are specified by multiple columns with the same header, e.g. Topical Subject in the following example:

 

A

B

C

D

E

F

1

Michael's Second Test Batch

michael.klein@northwestern.edu

 

 

 

 

2

Main Title

Creator

Date Issued

Topical Subject

Topical Subject

File

3

Nachos: A Memoir

Klein, Michael B.

2012-12-22

Meat

Cheese

content/tasty_tasty_nachos.mp4

 

Supported Field Names (required fields in bold)

  • Main Title
    • MODS mapping: titleInfo/title
    • Not repeatable
    • Required field.   Title is used for display in search results and single item views. Only the first 32 characters of a title are included in search results listings.   Recommended use is to reflect the content captured in digitized media files (such as the title of the piece performed or a short description of the content of a home movie).
    • Editable after ingest in "Title" field of Resource Description form.
  • Creator
    • MODS mapping: name@usage="primary"/namePart (role/roleTerm set to "Creator")
    • Repeatable
    • No ability to specify Corporate Body in batch at this time
    • Required field.     Main contributors are the primary persons or bodies associated with the creation of the content.   Main contributors will be included in search results display and aggregated for browsing access.   At this time there is no ability to specify a main contributor as a corporate body.   When possible, use the   Library of Congress Name Authority File .
    • Editable after ingest in "Main contributor(s)" field of Resource Description form.
  • Contributor
    • MODS mapping: name/namePart (role/roleTerm set to "Contributor")
    • Repeatable
    • Contributors are persons or bodies associated with the item but not considered primary to the creation of its content.   Examples of this would be performers in a band or opera, conductor, arranger, cinematographer, and choreographer.   At this time this is no ability to specify a contributor as a corporate body.   When possible, use the   Library of Congress Name Authority File .
    • Editable after ingest in "Contributor(s)" field of Resource Description form.
  • Genre
    • MODS mapping: genre
    • Repeatable
    • Genre can be used to categorize an item by form, style, or subject matter.   For consistency and to allow for sorting and aggregating, use terms from the   Open Metadata Registry labels for PBCore: pbcoreGenre .
    • Editable after ingest in "Genre(s)" field of Resource Description form.
  • Publisher
    • MODS mapping: originInfo/publisher
    • Repeatable
    • Publisher of the content of the item.
    • Editable after ingest in "Publisher(s)" field of Resource Description form.
  • Date Created
    • MODS mapping: originInfo/dateCreated@encoding=”edtf”
    • Not repeatable
    • Creation date should only be used if Date Issued is a re-issue date.   Then Creation date would contain the original publication date.   Enter date information in a format consistent with the options shown in   Extended Date/Time Format (EDTF) 1.0 .
    • Editable after ingest in "Creation date" field of Resource Description form.
  • Date Issued
    • MODS mapping: originInfo/dateIssued@encoding=”edtf”
    • Not repeatable
    • Required field.     Date should be the main publication date associated with the item to be used for sorting browse and search results.   Enter date information in a format consistent with the options shown in   Extended Date/Time Format (EDTF) 1.0 .
    • Editable after ingest in "Publication date" field of Resource Description form.
  • Abstract
    • MODS mapping: abstract
    • Not repeatable
    • Abstract provides a space for describing the contents of the item.   Examples include liner notes, contents list, or an opera scene abstract.   This field is not meant for cataloger's descriptions but for descriptions that accompany the item.   The first 15-20 words are included in search result listings.
    • Editable after ingest in "Summary" field of Resource Description form.
  • Topical Subject
    • MODS mapping: subject/topic
    • Repeatable
    • Subject should be used for the topical subject of the content.   For consistency and to allow for sorting and aggregating, use terms from the   Library of Congress Subject Headings .   For temporal subjects (time periods), use Temporal Subject and for geographic subjects (locations), use Geographic Subject.   See below.
    • Editable after ingest in "Subject(s)" field of Resource Description form.
  • Geographic Subject
    • MODS mapping: subject/geographic
    • Repeatable
    • Geographic Subject should be used for the location associated with the content.   For consistency and to allow for sorting and aggregating, use terms from the   Getty Thesaurus of Geographic Names .
    • Editable after ingest in "Location(s)" field of Resource Description form.
  • Temporal Subject
    • MODS mapping: subject/temporal
    • Repeatable
    • Temporal Subject should be used for the time period of the content (for example, years or year ranges).   Enter date information in a format consistent with the options shown in   Extended Date/Time Format (EDTF) 1.0 .
    • Editable after ingest in "Time period(s)" field of Resource Description form.

In addition to the descriptive fields, there are five supported operational fields:

  •   Publish
    • Whether the item should be automatically published after ingest.
    • Default is "No".
    • To trigger auto-publishing, enter value of "Yes".
  • Hidden
    • Whether the item will appear in search/browse results for end users. Use this field to prevent users from discovering extra-restricted items.
    • Default is "No".
    • To trigger hiding, enter value of "Yes".
    • Hidden items will still appear in search/browse results for those with ingest privileges.
  • File
    • Required field.   Content files listed in the manifest file must have the correct path noted for where those files are located in the Avalon dropbox, relative to the manifest file.   Additionally, all content files must include a file extension.   If necessary, include any directories or subdirectories (note the paths listed in columns D and E in the above example).
    • Repeatable
    • Label and Offset can be listed in any order following the file they are describing.
  • Label
    • Label is used for display in single item views. Recommended use is to reflect the content captured in digitized media files (such as the Part 1 and Part 2 of the piece performed or titles of songs).
    • Only repeatable following a file entry.
    • Editable after ingest in "Label" field of Manage Files page
  • Offset
    • Offset is used to set the thumbnail and poster image for the display in search/browse results and single item views.   Must be entered between 00:00:00.000 and length of file.   Enter offset time in a format consistent with this format (hh.mm.ss.sss).
    • Only repeatable following an additional file.
    • Default is 2 seconds into playback.  
    • Only applicable to video files.   Audio files have a default thumbnail, offset will be ignored.
    • If a record contains multiple files, the first offset listed will set the thumbnail and poster image for the Avalon record.
    • Editable after ingest in "Poster Offset" field of Manage Files page or on the item preview page.  
  • Skip Transcoding
    • Skip Transcoding is used if a pre-encoded derivative of the file is what is being uploaded to Avalon instead of the master version of the file.  This presumes that the derivative(s) match the requirements explained in Avalon Derivatives .  Master file location information should be included for complete object ingest.  See Absolute Location for further information.
    • Only repeatable following a file entry.
    • Valid values: “yes” or “no”
  • Absolute Location
    • Absolute Location is used with Skip Transcoding to indicate the location of the master version of a video or audio file when the file uploaded to Avalon is a pre-encoded derivative.
    • Only repeatable following Skip Transcoding when set to “yes”.
    • If Skip Transcoding is set to “no” or not included, Absolute Location will be ignored.
    • Absolute Location should be the full URI path of the server housing the master version of the file.

Notes

The batch ingest process will verify that the package is   complete (i.e., all content files specified in the manifest are present and not open by any other processes) before attempting to ingest. If the package is incomplete, it will be skipped and returned to on a subsequent pass.

Each batch will generate 2 emails to the email address listed at the top of the manifest.