Page tree

Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

Table of Contents


Info

This documentation is for Release 7.x. For the Release 1 version, see v.43. For the Release 2 version, see v.71. For Release 3.0.0, see v.86. For Release 3.1, see v.88. For Release 3.2, see v.108. For Release 3.3, see v.129. For Release 4.0, see v.159. For Release 5.x, see v.168. For Release 6.x, see v.176.

Introduction

Avalon's

...

batch ingest feature provides a method of building one or more media

...

items at a time from uploaded content and metadata outside the user interface. A batch ingest is started by uploading an

...

ingest package consisting of one 

...

manifest file and zero or more 

...

content files to the Avalon dropbox. For your convenience there is a demo ingest package available to download and import into test systems. Follow the instructions below to ensure a successful batch upload.

Ingest Packages

An

...

ingest package is the combination of content and metadata that make up a single batch. Structural metadata documents in the form of XML may also be uploaded - one per a/v content file. 

Package Layout

...

When a new collection is created, Avalon creates a subdirectory with the name of that collection (substituting underscores for any blanks), beneath the Avalon dropbox

...

directory. The package (manifest file and associated content files) must be uploaded to that collection-named subdirectory or in a subdirectory beneath it. All items included in a single ingest package will be uploaded to the same collection. The following is a very simple

...

package that has been uploaded:

Image Modified

Manifest File Format

The manifest file is a spreadsheet (xls, xlsx, csv, or ods) containing the metadata for the

...

items to be created, as well as the names of the content files that make up each

...

item. In this case, the manifest file is named batch_manifest.xlsx.

...

See  batch_manifest_template.xlsx for an Excel example file. Required fields are in bold. Note: Neither the spreadsheet filename nor any folder/directory names above it can have blanks in them–substitute underscores.


ABCDEFGH
1Michael's First Test Batchmichael.klein@northwestern.edu

...







2

...

Bibliographic IDTitleCreatorDate

...

Issued

...

File

...

LabelFileLabel
3123456Test

...

item 1Klein, Michael B.2012content/file_1.mp3Part 1content/file_2.mp4Part 2
4789012Test

...

item 2Northwestern1951content/file_3.mp4

...




Row 1, Column A contains a reference name for the batch. This is mostly for your reference so we recommend naming the batch file according to what will help you remember the contents.

Row 1, Column B contains the submitter's email address (or username, depending on how your system is set up) to be used for notifications and exceptions

...

. The submitter's email or user name must be listed as a manager, editor, or depositor for the collection in which this batch is deposited in the Avalon dropbox.

Row 2 specifies the names of the metadata fields supplied in the following rows. 

...

Title

...

Date Issued, and File are required. These fields are shown in bold in the Excel example file. Each subsequent row represents a single

...

media item to be created. Metadata values are specified first, followed by a list of content files to be attached to each

...

item. Note: Make sure none of the field names in row 2 have leading or trailing blanks, or the field names will not be recognized by Avalon and will report an error.

Content files listed in the manifest file must have the correct path noted for where those files are located in the Avalon dropbox, relative to the manifest file. Additionally, all content files must include a file extension. If necessary, include any directories or subdirectories (note the paths listed in columns E and G in the above example).

Multivalued fields are specified by multiple columns with the same header, e.g. Topical Subject in the following example:

...


ABCDEF
1Michael's Second Test Batchmichael.klein@northwestern.edu

...





2

...

TitleCreatorDate

...

IssuedTopical SubjectTopical Subject

...

File
3Nachos: A MemoirKlein, Michael B.2012-12-22MeatCheesecontent/tasty_tasty_nachos.mp4

Supported Field Names

...

  • Main Title
    • MODS mapping: titleInfo/title
    • Not repeatable
    • Required field – This should be the title used for display in browsing and search results
  • Alternative Title
    • MODS mapping: titleInfo@type=”alternative”
    • Repeatable
  • Translated Title
    • MODS mapping: titleInfo@type=”translated”
    • Repeatable
  • Uniform Title
    • MODS mapping: titleInfo@type=”uniform”
    • Repeatable
  • Creator
    • MODS mapping: name/namePart
      • ATTN: Can we assign “creator” for this role to distinguish it from other names included with an item?  That would mean auto assignment of “creator” or some other role for name/role/roleTerm within this name element.
    • Not repeatable
    • No ability to specify Corporate Body in batch at this time
      • ATTN: Is this editable in the form after ingest?
    • Required field – This should be the main person or body associated with the item
  • Contributor
    • MODS mapping: name/namePart
      • ATTN: I don’t think there’s any role we can automatically assign for name/role/roleTerm with this name element?
    • Repeatable
    • No ability to specify Corporate Body in batch at this time
      • ATTN: Is this editable in the form after ingest?
  • Statement of Responsibility
    • MODS mapping: note@type=”statement of responsibility”
    • Not repeatable
  • Resource Type
    • MODS mapping: typeOfResource
    • Not repeatable
    • This will help sort results and browse-able content.  Please use one of the following:
      • sound recording-musical
      • sound recording-non-musical
      • sound recording
      • still image
      • moving image
  • Genre
  • Publisher
    • MODS mapping: originInfo/publisher
    • Not repeatable
  • Place of Origin
    • MODS mapping: originInfo/place/placeTerm
    • Not repeatable
  • Date Created
    • MODS mapping: originInfo/dateCreated@encoding=”edtf”
    • Not repeatable
    • Date Created should only be used if Date Issued is a re-issue date.  Then Date Created would contain the original publication date.
    • Enter date information in a format consistent with the options shown in Extended Date/Time Format (EDTF) 1.0 
  • Date Issued
    • MODS mapping: originInfo/dateIssued@encoding=”edtf”
    • Not repeatable
    • Required field – This should be the main date associated with the item to be used for sorting browse and search results.
    • Enter date information in a format consistent with the options shown in Extended Date/Time Format (EDTF) 1.0 
  • Copyright Date
    • MODS mapping: originInfo/dateIssued
    • Repeatable
    • This field does not need to be formatted to any certain encoding standard.
  • Language Code
  • Language Text
  • Abstract
    • MODS mapping: abstract
    • Not repeatable
  • Note
    • MODS mapping: note
    • Repeatable
    • No ability to distinguish type of note for batch upload at this time
      • ATTN: Are we using an automatic type for note at this point or not specifying a type?
  • Topical Subject
    • MODS mapping: subject/topic
    • Repeatable
  • Geographic Subject
    • MODS mapping: subject/geographic
    • Repeatable
  • Temporal Subject
    • MODS mapping: subject/temporal
    • Repeatable
  • Occupation Subject
    • MODS mapping: subject/occupation
    • Repeatable
  • Person Subject
    • MODS mapping: subject/name@type=”personal”/namePart
    • Repeatable
  • Corporate Subject
    • MODS mapping: subject/name@type=”corporate”/namePart
    • Repeatable
  • Family Subject
    • MODS mapping: subject/name@type=”family”/namePart
    • Repeatable
  • Title Subject
    • MODS mapping: subject/titleInfo/title
    • Repeatable
  • Related Item ID
    • MODS mapping: relatedItem/identifier
    • Repeatable
    • No ability to specify type of relation in batch at this time
      • ATTN: Are we using an automatic type at this point or not specifying a type?

In addition to the descriptive fields, there is one supported operational field, Publish (default: false) for which a value of "True" will cause the newly ingested media object to be published immediately after ingest.

Notes

The batch ingest process will verify that the package is complete (i.e., all content files specified in the manifest are present and not open by any other processes) before attempting to ingest it. If the package is incomplete, it will be skipped and returned to on a subsequent pass.

 

Please see Supported Field Names for information about fields available and supported in the batch spreadsheet. 

Multiple File Ingest of Different Quality Files For a Single Avalon Item

Avalon supports ingest of multiple derivatives that may be selected with the High/Medium/Low gear-buttons of the video player during playback (or High/Medium for audio). The “File” field in the manifest and the naming convention of the files in the Avalon dropbox directory must be formatted correctly for the batch ingest to be successful. Avalon will know what filename to look for from the manifest file, find the quality levels specified in the dropbox directory, and ingest the formatted files accordingly. It is not required to have all three quality tiers for multiple file ingest.

For a single Avalon item, input a filename in the “File” field and input “Yes” in the “Skip Transcoding” field of the manifest file. Add multiple files for this Avalon item to the dropbox directory. The “File” field as well as the file names of your different quality files in the Avalon dropbox directory must be formatted with the following convention:

File Name in Manifest Filefilename.mp4
Files in Dropbox Directoryfilename.high.mp4; filename.medium.mp4; filename.low.mp4


Info

Please note that files must match this convention strictly; extra periods are not allowed. filename.test.high.mp4 is invalid; filename.high.mp4 is valid.


Example manifest file for multiple file ingest of different quality files for a single avalon item:


A

B

C

D

E

1

Michael's Third Test Batch

michael.klein@northwestern.edu




2

Title

Creator

Date Issued

File

Skip Transcoding

3

Multiple Quality Ingest

Klein, Michael B.

2015

content/filename.mp4

Yes


Image Added

Adding structure files via batch

The Batch Ingest Package can include XML structure files. One structure XML file can be attached per media file. See the demo ingest package at the top of this page for an example structural XML file included in a batch.

If the manifest lists a file named test.mp4, it will look for a structure file named test.mp4.structure.xml - you can edit the xml later via the user interface "Structure" tab in Avalon. 

For more information about structure files (schema expectations and examples), see Adding Structure to Files Using the Graphical XML Editor

Adding caption files via batch

The Batch Ingest Package can include WebVTT or WebSRT captions files. One captions file per media file. If the manifest lists a file named test.mp4, it will look for a captions file named test.mp4.vtt. If one is found, it will be attached to the media file as captions. This captions file can be updated or removed later via the user interface "Structure" tab in Avalon.

Batch Processing Notes

Each batch will generate 2 emails to the user listed at the top of the manifest.

Once Avalon detects the presence of an unprocessed manifest file, it will first verify that the necessary metadata columns are present in the manifest and that the file is not broken. At this stage, only the manifest file itself and not  the metadata listed in the manifest has been validated.

If the manifest is incomplete or includes errors, such as invalid metadata values, only items that are valid (metadata which is valid, media file paths which are valid) will be created. An email will be sent to the email address specified in the manifest detailing the outcome, whether successful or not, listed in the manifest.

If a Bibliographic ID is provided for a resource but fails to process, the error email will only indicate that required fields are missing and will not indicate that the Bibliographic ID failed or was invalid.

To re-run a completed batch, follow the instructions in the email sent by the system after the batch is fully processed. It will contain a special filename that can be used to run the batch job again.

MARC record ingest

Please see information about MARC record ingest with Supported Field Names.