Page tree
Skip to end of metadata
Go to start of metadata

This document includes information on setting up a Windows workstation to perform the Born Digital Preservation Lab Ingest workflow.  Users will need to have access to the Scandium virtual server (contact Brian Wheeler).

Directory Structure

The user needs to create a 'BDPL' folder on the main C:\ drive.  This folder should include two additional folders: \resources\ and \scripts\

Dependencies

The BDPL Ingest Tool requires both:

  • Python 3.7; default installation plus:
    • wget
    • lxml
    • openpyxl
    • psutils
    • chardet
  • Cygwin; default installation plus:
    • curl
    • ddrescue
    • gcc-core
    • gcc-g++
    • make
    • openssl-devel
    • patchutils
    • perl
    • rsync
    • tree
    • wget
    • cdparanoia
    • pkg-config
    • git

In addition, the following tools will need to be compiled using Cygwin sh.exe (or similar application; run as Administrator) and saved to C:\cygwin64\usr\local\bin (using "./configure", "make", and "make install" commands):

  • lsdvd
    • The user may need to use the following command to ensure libdvdread is identified when compiling lsdvd:  ./configure PKG_CONFIG_PATH=/cygdrive/c/cygwin64/usr/local/lib/pkgconfig/
  • bchunk 

The following applications must also be installed in C:\Program Files\ (unless otherwise noted):

The BDPL is currently using an Epson Perfection 4990 Photo scanner to produce images of media.  If still in use, install the driver (requires system restart).

C:\BDPL\scripts Folder

The user should clone the bdpl_ingest GitHub repository to C:\BDPL\scripts.  Open a CMD.EXE window and type the following commands:

cd C:\BDPL\scripts
git clone https://github.com/IUBLibTech/bdpl_ingest.git .

This repository includes:

  • Main tools:
    • bdpl_ingest.py: main BDPL ingest tool; used to transfer and analyze content

    • bdpl_bag-prep.py: used to bag, tar, and move completed SIPs to Archiver Spool dropbox.  Also records information on bdpl_master_spreadsheet.

    • dfxml.py and Objects.py: modules from DFXML python project; used for ingest procedures

  • Additional tools and supporting files:
    • bdpl.txt: BDPL ASCII art

    • BDPL_launch.bat: used to launch bdpl_ingest.py.
      • Send to Desktop as a shortcut.
      • Right-click the shortcut and open properties; in the 'Target' text box, add the UNC path to Scandium workspace (\\XXX.XX.XXX.XXX\bdpl\workspace) as an argument after C:\BDPL\scripts\BDPL_launch.bat (will be mounted as Z: drive)

    • BDPL_bag_launch.bat: used to launch bdpl_bag-prep.py
      • Send to Desktop as a shortcut.
      • Right-click the shortcut and open properties. in the 'Target' text box, add two UNC paths as arguments after C:\BDPL\scripts\BDPL_launch.bat :
        • 1st argument: Archiver Spool location (mounted as Y: drive): \\XXX.XX.XXX.XXX\bdpl
        • 2nd argument: main BDPL workspace (mounted as Z: drive): \\XXX.XX.XXX.XXX\bdpl\workspace 

    • bdpl_fix_failures.py: used to fix common errors that occur during BDPL Bag Prep.

    • bdpl_manual_premis.py: used to add PREMIS preservation metadata for manual workarounds (such as using FTK Imager to replicate files from disk images).

    • bdpl_validate_spreadsheet.py: checks the manifest submitted by collecting units to ensure:
      • Appropriate column headings are employed
      • All items include a barcode
      • No duplicate barcode values are included on the spreadsheet (also checks against content stored in the SDA)

    • replace_copy.bat: used to copy file/folder names with '/' in path so that Python can interpret.  Include as a shortcut in the user's SendTo folder so that it can be accessed by right-clicking on object.

Whenever scripts are updated in the main GitHub repository, BDPL staff will need to update the local repository.  To do so, open a Windows CMD.EXE terminal and execute the following commands:

cd C:\BDPL\scripts
git reset --hard
git pull

The BDPL user will then need to reset permissions on the scripts folder; navigate to C:\BDPL\scripts via the Windows File Manager, right-click on the folder, and then select 'Properties' from the context menu.  Once the Properties window opens, select the 'Security' tab and click the 'Advanced' button:

A new window will open; check the box in the bottom left corner and then click 'Apply'; the updated scripts are now ready to run!

C:\BDPL\resources Folder

Download the contents of the 'BDPL Resources' folder (in iu.box.com) to C:\BDPL\resources.  There will be two folders:

  • assets: includes .css and .js files used to style an HTML report on SIP contents
  • toc2cue: an application to convert the audio CD table of contents files produced by cdrdao (Windows binary was compiled with Cygwin; requires associated DLLs to operate)

In addition to the above, download the following Windows binaries (no installation required) to individual folders within  C:\BDPL\resources:

Edit PATH System Environment Variable

To ensure that the BDPL Ingest Tool can call the above utilities, the following need to be added to the PATH system environment variable (change accordingly if the locations of applications differ):

  • C:\cygwin64\bin
  • C:\cygwin64\sbin
  • C:\cygwin64\usr\local\bin
  • C:\Python37
  • C:\Python37\Scripts
  • C:\Program Files\ffmpeg\bin
  • C:\Program Files\TeraCopy
  • C:\Program Files\AccessData\FTK Imager
  • C:\Program Files (x86)\Bulk Extractor 1.6.0-dev
  • C:\Program Files (x86)\Bulk Extractor 1.6.0-dev\python
  • C:\BDPL\resources\hfsexplorer-0.23.1\bin
  • C:\BDPL\resources\hfsexplorer-0.23.1\lib
  • C:\BDPL\resources\siegfried
  • C:\BDPL\resources\sleuthkit-4.6.4\bin
  • C:\sqlite
  • C:\BDPL\resources\disktype
  • C:\BDPL\resources\du
  • C:\Program Files\Notepad++\notepad++.exe 
  • C:\Program Files (x86)\Exact Audio Copy\CDRDAO
  • C:\BDPL\resources\toc2cue 
  • C:\BDPL\resources\clamav
  • C:\Program Files (x86)\FC5025
  • C:\BDPL\resources\droid-binary-6.4-bin (adjust if different version is available)

Additional Windows Configuration

Before running the BDPl Ingest Tool, the following changes should be made to the Windows system (current as of Windows 10):

  • Turn off AutoPlay for USB and drives (set default for all optical disk to 'Do Nothing' @ Control Panel\Hardware and Sound\AutoPlay)
  • Enable long file paths on Windows 10
  • Adjust folder views: show extensions and hidden files; do not show thumbnails
  • Make TeraCopy the default file handler
  • Run Disk Usage (du64.exe) and agree to EULA.
  • No labels