The BDPL Ingest Workflow requires a standardized process for receiving shipments of content from university units and individual creators (hereafter referred to as 'the source'). At the most basic level, each shipment should be accompanied by a manifest (using the BDPL Excel workbook template) and each item on the shipment must have a unique identifier recorded in the manifest (along with other available information about the content).
A 'shipment' may be comprised of:
- One or more pieces of physical media (e.g., floppy disks, optical media, hard drive, etc.)
- One or more files or archive files, in the case of content received via network locations, email, FTP transfer, or downloads.
Each 'item' in a shipment should be uniquely identified and listed in an inventory so that BDPL staff can unambiguously identify the item. If the shipment includes physical media, these should be placed in a box clearly marked with the shipment date and the name of the shipment source. Ideally, the source and BDPL will have a signed agreement that documents how many items were sent, the date, and the responsible parties.
The BDPL manager will assign a unique identifier to each shipment, using a basic convention that includes a three-letter acronym or abbreviation for the shipment source and the date the shipment was received in YYYYMMDD format. For example:
- MPP_20190612 identifies a shipment from the Modern Political Papers (MPP) received on June 12, 2019
- UAC_20190513 identifies a shipment from the University Archives Collections (UAC) on May 13, 2019.
This identifier will be used as the filename of the inventory workbook and the date information will be used to distinguish between multiple shipments from the same unit in the Scandium workspace.
Defining an 'Item'
In handling born-digital content, the IU Libraries have typically defined an 'item' as a single piece of physical media (with all of its files). In this manner, a floppy disk, CD or DVD, USB drive, or other piece of media may be considered as an intellectual entity and all files on the media will be included in a single Submission Information Package (SIP).
In some cases, it may be useful or even necessary to divide the contents of large internal/external hard drives and USB drives into multiple 'items', particularly if:
- The media contain a very large number of files
- The volume of the media is exceptionally large (e.g., 100s of GB or one or more TB)
- The directory structure of the media is well-defined and understood by the unit and different sections should be handled as unique Submission Information Packages.
In such cases, the BDPL manager will work with the collecting unit to define and document the multiple items on the single piece of media. Conventions for assigning identifiers in such cases will be defined below.
In cases where born-digital materials are acquired by the library via network or FTP transfers, email, download, or other means (where no physical media is provided), the collecting unit has two options:
- Include all transferred content in a single 'item' / SIP
- Define multiple items based upon discrete acquisitions of content (i.e., multiple emails or downloads over a period of time), directory structure, or knowledge of the file contents.
NOTE: in cases where multiple items will be defined from a single piece of media (or transfer with no physical media) the collecting unit must not move or otherwise reorganize content, as doing so will result in the loss of important contextual information regarding the provenance and original order of content. In such cases, the unit should work with the BDPL manager to document the items in the BDPL manifest workbook or create a plain text list of folders/files for each item that will be used in the BDPL transfer process.
Assigning Item Identifiers
The BDPL recommends the use of barcodes to uniquely identify items in a shipment. By employing a barcode scanner, both the source and BDPL staff will be able to enter and search for information in an efficient manner with less likelihood for user errors. While the BDPL procedures will check for duplicate identifiers (within a given shipment and against content already deposited to the SDA), it is the collecting unit's responsibility to ensure that the same identifier is not used more than once.
If barcodes are not available to the shipment source, the BDPL manager will assist with the identification of a suitable convention. In the past, the BDPL has employed a convention that includes:
- A three letter acronym or abbreviation for the unit (e.g., 'UAC' for University Archives)
- The year and month in YYYYMM format (e.g., '201907')
- A zero-padded, four digit value, that increases incrementally for each item (e.g., '0001', '0002', '0003', etc.)
This legacy convention would result in identifiers such as 'UAC2019070001', 'UAC20190070002', etc.
NOTE: if barcode labels are not employed, the collecting unit will still need to print labels for the alternate identifiers and affix them to physical media so that BDPL staff are able to correctly identify items.
Affixing Identifier Labels
As noted above, each item in a shipment must include a unique identifier. For content received on physical media (floppy disks, optical media, hard drives, USB drives, etc.), each physical object should have its own identifier. For example, if two CDs are in a single case, each CD should have it's own identifier and listing in the shipment inventory.
- For optical media and internal hard drives, barcode labels should be applied to a case or enclosure–attaching them to the disc or drive itself may interfere with or damage the optical disk reader.
- For other media (such as floppy disks, zip disks, and external hard drives), the barcode label may be affixed to the object itself in an unobtrusive place that will not cover any labels or moving parts of the object.
- For optical media with no jewel case or enclosure as well as smaller items such as USB thumb drives, it may be necessary to affix the barcode label to an envelope, piece of paper, or custom enclosure that can be securely associated with the media:
NOTE: for content with no have physical media, the collecting unit should assign a barcode (or other identifier value), record it in the BDPL manifest, and dispose of any physical label to make sure that it is not reused with another item.
Creating an Item Manifest Workbook
Collecting units should be sure to use the most recent workbook template when creating a new item manifest for a BDPL shipment.
A BDPL workbook has three pages:
- 'Basic Transfer Information'
The first two should be filled out by the collecting unit, while the third will be populated by information generated during the BDPL ingest process.
This work sheet is used to identify and provide contact information for the collecting unit's primary liaison so that the BDPL manager may ask questions and provide updates. The five fields are:
- Collecting unit:
- Primary contact:
- Date of transfer to BDPL
This worksheet is intended to capture information about each item in the shipment so that the BDPL manager and collecting unit staff can identify, search for, and manage content content during the BDPL ingest procedure and after SIPs have been deposited to the SDA. Units only need to complete those fields that are relevant and applicable; all others may be left blank. The fields include:
- Identifier (REQUIRED): must contain a unique identifier; if the item is a piece of physical media, a label with the identifier must be affixed to the object.
- Accession ID (optional): to be completed if the collecting unit generates accession numbers and needs to track the specific accession in which the item arrived.
- Collection title (optional): to be completed if the unit knows the item belongs to a larger collection.
- Collection ID (optional): to be completed if the item is part of a larger collection for which a unique identifier has been created.
- Creator (optional): to be completed if the collection creator associated with the item is known (unit may also identify specific content creator(s) for the item, if this information is important).
- Physical location of media (optional): this field is intended to help units track where physical media was originally stored, in case this information is meaningful or the object needs to be returned to its original location. Units may enter data using their own meaningful conventions; may refer to a specific folder, box, shelf, or work area.
- Source type (REQUIRED): used to help identify the item; units must select a value from the drop down menu in this field so that standardized values are entered (and can be used in future reporting on BDPL activities). If the media is not included in the drop down menu, it is likely that the BDPL will not be able to accommodate the item. However, it may be included in the Inventory and the unit should enter the product name or description in this field. The values in the drop-down menu include:
- 3.5" floppy disk
- 5.25" floppy disk
- Zip disk
- Optical disk
- Hard drive (internal or external)
- USB drive (for other USB devices)
- Email/Network/Download (for any transfer without physical media)
- Title (optional): to be used for a derived or assigned title to be associated with the digital content.
- Label transcription (optional): to be completed if the physical media has a label or insert with information about the contents of the file. NOTE: this field would also be suitable for documenting the title of an item.
- Initial description appraisal notes (optional): to be completed if the unit wishes to provide a description of content or, if unknown, to make notes about the nature or potential value of the items's content. Should also be used to document any concerns or issues about sensitive data.
- Content date range (optional): to be used if collecting unit wishes to record when materials were originally created or used, especially if file last-modified times have been altered or in the case of Compact Disc Digital Audio (CDDA), for which timestamps cannot be extracted from content. If not used, the BDPL Ingest Tool record dates extracted from content. Entering standardized dates is very important here:
- YYYY if only the year is known (e.g., 2009)
- YYYYMM if month and year are known (e.g., 201001)
- YYYYMMDD if day, month, and year are known (e.g., 20051120)
For date ranges, separate the two values by a hyphen (e.g., 2005-2007 or 20090314-20090316); otherwise just include a single date value, if known.
- Instructions for BDPL staff (optional): to be completed if the unit has any specific requests or questions for BDPL staff to consider (for instance, instructions to not transfer commercial software or to only copy certain directories from a hard drive).
- Restriction statement (optional): if relevant, the collecting unit may wish to include a statement on access or use restrictions for the item (example: Modern Political Papers may wish to note that some items will have restricted public access for 50 years after creation).
- Restriction end date (optional): if known or relevant; the collecting unit may use this field to record the date on which restrictions on content would be removed, using format YYYY-MM-DD (i.e., 2019-02-20).
- Move directly to SDA without appraisal? (optional): to be completed if the unit wishes to indicate that an item should be moved directly to the SDA without any additional appraisal. If used, the unit must select a value from the drop-down menu in the cell.
This worksheet will be completed during the BDPL ingest process, which will combine information from the Inventory worksheet with metadata generated by different microservices.
The fields imported from the Inventory worksheet include:
- Accession ID
- Collection Title
- Collection ID
- Physical location of media
- Source type
- Label transcription
- Initial appraisal notes
- Restriction statement
- Restriction end date
The BDPL ingest process will provide additional information about the transfer process and the content:
- Transfer method: identifies the tool used to transfer content from the physical media or network location,
- Migration date: the date content was migrated
- Migration outcome: Indicates success or failure of transfer process
- Migration notes: Used by BDPL staff to record any issues or observations about content and/or the transfer process.
- Extent (normalized): Human-readable information of the size of files extracted from the disk image or copied from a network location.
- Extent (raw): Size (in bytes) of files extracted from the disk image or copied from a network location.
- No. of files: Number of files extracted from the disk image or copied from a network location. For DVD-Video and audio CDs, this will reflect the titles ripped from the original media.
- No. of Duplicate Files: Indication of duplicate content among extracted/copied files (actual list of duplicates provided in accompanying HTML report).
- No. of Unidentified Files: Indication of files with formats not included in the PRONOM format registry.
- File Formats: List of the most prominent file formats found on the item (includes a maximum of 10; full list of formats available in technical metadata).
- Begin Date: Earliest date extracted from last-modified timestamps on extracted/transferred files. May not accurately reflect when the files were actually created or used. For content from audio CDs, this will reflect the date content was ripped.)
- End Date: Latest date extracted from last-modified timestamps on extracted/transferred files. May not accurately reflect when the files were actually created or used. For content from audio CDs, this will reflect the date content was ripped.)
- Virus Status: Indicates if any viruses or malware were found on extracted/transferred files.
- PII Status: Indicates if bulk_extractor found sensitive information in extracted/transferred files; will indicate account numbers (SSNs, credit cards, or bank account numbers), email addresses, and telephone numbers.
The information in the appraisal spreadsheet is intended to assist with:
- The appraisal of content by the collecting unit to review any sensitive information and determine if items should be moved to the SDA.
- Troubleshooting and reporting of ingest procedures by the BDPL manager.