This documentation is for Release 4.0. For the Release 1 version, see v.43. For the Release 2 version, see v.71. For Release 3.0.0, see v.86. For Release 3.1, see v.88. For Release 3.2, see v.108. For Release 3.3, see v.129. |
Avalon's batch ingest feature provides a method of building one or more media items at a time from uploaded content and metadata outside the user interface. A batch ingest is started by uploading an ingest package consisting of one manifest file and zero or more content files to the Avalon dropbox. For your convenience there is a demo ingest package available to download and import into test systems. Follow the instructions below to ensure a successful batch upload.
An ingest package is the combination of content and metadata that make up a single batch. Structural metadata documents in the form of XML may also be uploaded - one per a/v content file.
When a new collection is created, Avalon creates a subdirectory with the name of that collection (substituting underscores for any blanks), beneath the Avalon dropbox directory. The package (manifest file and associated content files) must be uploaded to that collection-named subdirectory or in a subdirectory beneath it. All items included in a single ingest package will be uploaded to the same collection. The following is a very simple package that has been uploaded:
The manifest file is a spreadsheet (xls
, xlsx
, csv
, or ods
) containing the metadata for the items to be created, as well as the names of the content files that make up each item. In this case, the manifest file is named batch_manifest.xlsx
. See batch_manifest_template_R4.xlsx for an Excel example file. Required fields are in bold. Note: Neither the spreadsheet filename nor any folder/directory names above it can have blanks in them–substitute underscores.
A | B | C | D | E | F | G | H | |
---|---|---|---|---|---|---|---|---|
1 | Michael's First Test Batch | michael.klein@northwestern.edu | ||||||
2 | Bibliographic ID | Title | Creator | Date Issued | File | Label | File | Label |
3 | 123456 | Test item 1 | Klein, Michael B. | 2012 | content/file_1.mp3 | Part 1 | content/file_2.mp4 | Part 2 |
4 | 789012 | Test item 2 | Northwestern | 1951 | content/file_3.mp4 |
Row 1, Column A contains a reference name for the batch. This is mostly for your reference so we recommend naming the batch file according to what will help you remember the contents.
Row 1, Column B contains the submitter's email address (or username, depending on how your system is set up) to be used for notifications and exceptions. The submitter's email or user name must be listed as a manager, editor, or depositor for the collection in which this batch is deposited in the Avalon dropbox.
Row 2 specifies the names of the metadata fields supplied in the following rows. Title, Date Issued, and File are required. These fields are shown in bold in the Excel example file. Each subsequent row represents a single media item to be created. Metadata values are specified first, followed by a list of content files to be attached to each item. Note: Make sure none of the field names in row 2 have leading or trailing blanks, or the field names will not be recognized by Avalon and will report an error.
Content files listed in the manifest file must have the correct path noted for where those files are located in the Avalon dropbox, relative to the manifest file. Additionally, all content files must include a file extension. If necessary, include any directories or subdirectories (note the paths listed in columns E and G in the above example).
Multivalued fields are specified by multiple columns with the same header, e.g. Topical Subject in the following example:
A | B | C | D | E | F | |
---|---|---|---|---|---|---|
1 | Michael's Second Test Batch | michael.klein@northwestern.edu | ||||
2 | Title | Creator | Date Issued | Topical Subject | Topical Subject | File |
3 | Nachos: A Memoir | Klein, Michael B. | 2012-12-22 | Meat | Cheese | content/tasty_tasty_nachos.mp4 |
Required fields include "Title", "Date Issued" and "File" (bolded below and in .xls template). Exception: If "Bibliographic ID" ingest is used, only "file name" is then required.
MODS mapping: relatedItem@type="original"/identifier
Not repeatable
Used to identify the original MARC catalog record from which metadata was generated and import data from the catalog record into Avalon.
Identifies the type of bibliographic ID supplied in the Bibliographic ID column. Valid types depend on system configuration and by default include "local", "oclc", lccn", "issue number", "matrix number", "music publisher","video recording identifier", and "other".
The value of "local" maps to "Catalog Key" in the Resource Description Form.
Other Identifier
MODS mapping: relatedItem@type="original"/identifier
Repeatable
Editable after ingest in "Other Identifier" field of Resource Description form.
Used to identify an external record that can connect the Avalon item to a catalog record or other record for the original item. This identifier differs from Bibliographic Identifier in that it is not used to retrieve a record from another system.
Other Identifier Type
MODS mapping: relatedItem@type="original"/identifier@type
Not Repeatable within Other Identifier
Editable after ingest in "Other Identifier Label" field of Resource Description form.
Identifies the type of external record identifier supplied in the Other Identifier column.
Valid types depend on system configuration and by default include "local", "oclc", "lccn", "issue number", "matrix number", "music publisher","video recording identifier", and "other"
MODS mapping: relatedItem@displayLabel/location/url
Statement of Responsibility
MODS mapping: note@type="statement of responsibility"
Repeatable
Editable after ingest in “Statement of Responsibility" field of Resource Description form.
Used to provide information about primary persons or bodies associated with the creation of the content, along with details about their roles. This information can be transcribed from the credits listed in the resource itself or on its packaging.
Recommended use is to provide a separate Contributor field for each person or body listed in the Statement of Responsibility. Statement of Responsibility may be left empty if the use of Contributor fields alone is preferred.
Statement of Responsibility is displayed in the user interface appended to the Title field, following a “ / “.
Note
MODS mapping: note
Repeatable
Editable after ingest in “Note" field of Resource Description form.
Used to describe aspects of the resource not accounted for in any of the other fields, such as creation or production credits, performers, venue/event date, historical or biographical information, language details, awards given to the performance or the work performed.
Recommended use is to provide a separate Contributor field for each person or body associated with the creation of the content and to use a Note to provide more information about such contributions or to provide information about secondary persons or bodies associated with the creation of the content.
Note Type
MODS mapping: note@type
Not repeatable
Editable after ingest in "Note Label" field of Resource Description form.
Identifies the type of note and is used as a label in the user interface.
In addition to the descriptive fields, there are operational fields for the file(s) being ingested:
Avalon supports ingest of multiple derivatives that may be selected with the High/Medium/Low gear-buttons of the video player during playback (or High/Medium for audio). The “File” field in the manifest and the naming convention of the files in the Avalon dropbox directory must be formatted correctly for the batch ingest to be successful. Avalon will know what filename to look for from the manifest file, find the quality levels specified in the dropbox directory, and ingest the formatted files accordingly. It is not required to have all three quality tiers for multiple file ingest.
For a single Avalon item, input a filename in the “File” field and input “Yes” in the “Skip Transcoding” field of the manifest file. Add multiple files for this Avalon item to the dropbox directory. The “File” field as well as the file names of your different quality files in the Avalon dropbox directory must be formatted with the following convention:
File Name in Manifest File | filename.mp4 |
Files in Dropbox Directory | filename.high.mp4; filename.medium.mp4; filename.low.mp4 |
Example manifest file for multiple file ingest of different quality files for a single avalon item:
A | B | C | D | E | |
---|---|---|---|---|---|
1 | Michael's Third Test Batch | ||||
2 | Title | Creator | Date Issued | File | Skip Transcoding |
3 | Multiple Quality Ingest | Klein, Michael B. | 2015 | content/filename.mp4 | Yes |
The Batch Ingest Package can include XML structure files. One structure XML file per media file. See the demo ingest package at the top of this page for an example structural XML file included in a batch.
If the manifest lists a file named test.mp4, it will look for a structure file named test.mp4.structure.xml - you can edit the xml later via the user interface "Structure" tab in Avalon.
For more information about structure files (schema expectations and examples), see Using the graphical xml editor for adding structure to files.
The Batch Ingest Package can include WebVTT or WebSRT captions files. One captions file per media file. If the manifest lists a file named test.mp4, it will look for a captions file named test.mp4.vtt. If one is found, it will be attached to the media file as captions. This captions file can be updated or removed later via the user interface "Structure" tab in Avalon.
Each batch will generate 2 emails to the user listed at the top of the manifest.
Once Avalon detects the presence of an unprocessed manifest file, it will first verify that the metadata columns are recognizable, that the required columns are present and have values in them, and that the package is complete (i.e., all content files specified in the manifest are present and not open by any other processes) before attempting to ingest.
If the package is incomplete or in error, it will not be processed and an error file will be generated in the same directory as the manifest file (e.g., batch_manifest.xlsx.error). The error file will contain details of what was missing, and will email the same information to the user specified in the manifest.
If a Bibliographic ID is provided for a resource but fails to process, the error file will only indicate that required fields are missing and will not indicate that the Bibliographic ID failed or was invalid.
To re-run a successfully completed batch, remove the *.processed file from the batch directory (e.g., batch_manifest.xlsx.processed).
When a Bibliographic ID is provided for a resource the corresponding MARC record is mapped to a MODS record for use in Avalon. The MARC to MODS mapping is based on the Library of Congress mapping to MODS 3.5: http://www.loc.gov/standards/mods/mods-mapping.html
The Avalon mapping differs mainly:
Detailed mappings of MARC fields and subfields to MODS records for the Resource Description Form and the Batch Ingest Form can be found at the Metadata Crosswalks page.