Page tree
Skip to end of metadata
Go to start of metadata
This is the accession procedure for items that cannot be represented in a disk image. Objects best suited for this procedure include hard drives with a lot of empty space, things that do not come to us on a physical disk (e.g., items transferred via a remote host like Box), or media only utilized for transfer (e.g., external hard drives used to transfer large photo collections).

Receiving

Step 1: Work with Donor to Ensure That the Item is Ready.

  1.  For items on Box:

    1. Ensure that there is no sensitive or secure data (see Box Transfer page for more information on what the BDPL can receive)
    2. Have donor either share the folder to be transferred with BDPL staff or share the BDPL Transfer folder with them to move their items into
      1. A note from Luke: **Do Not change the name of the folder(s)! This is very tempting and very easy to do (especially if folder names don't make any sense to BDPL, or if they have special characters in them)**
    3. If the donor is moving things into Box from their original hard drive, ask them to use a tool to ensure that what they want to transfer is what is successfully in the Box folder (Box is not good at notifying users of a transfer failure)
      1. The donor may use BDPLinventory for this.
    4. Ensure that items to be transferred are in final form with donor
  2. For hard drives and other media:
    1. Follow safe handling instructions and mount using writeblockers whenever possible
    2. Run any initial software, such as ExifTool, as necessary according to individual donor needs
  3. For items on an IU Server (Karst, CITO, BigRedII, etc.):
    1. If possible donors should move their items to Box. BDPL staff can offer some help in this regard, but this is ultimately the donor's responsibility.
    2. If it is not feasible to move items to Box, or place them on a physical device, the BDPL will need to be able to login to the server in order to transfer the files directly.
      1. Transferring files directly from these servers is actually very easy, but only if BDPL staff can login and access the items in question.

Step 2: Create, Open, and/ or Edit Folders and Logs.

  1. Create accession number, using BDPL guidelines.
  2. Fill-in the relevant media log with "Receiving" information.
  3. Create folder in 1received named with the BDPL accession number, and insert/ create the proper info-template, using the no-disk template in the  ~/Documents/0templates/  folder.
  • Note overall size of media and size utilized.

 *Note: For more information on this step, refer to the Accession Procedure for Disk Images, steps 2, 3, 5-7.

Pre-Ingest

Step 3: Run BDPLinventory script before transfer

*Note: There is no need to compare the comparison files at this point, of course.

See => BDPLinventory Script for Inventory and Checksums of Large Media

 

Step 4: Run exiftool

???


Step 5: Transfer

1. For items on Box use FileZilla for SFTP transfer:

*For more information about this see => https://kb.iu.edu/d/bcim

a. Open FileZilla on the BDPL

host: ftp.box.com

username: bdpl

password: <bdpl external password> (If you do not know this password, consult other BDPL staff)

port: 990

b. Drag Box items to destination folder in 1received (ex. ~/Documents/1received/UAC0920160003/ )

c. Exit FileZilla

2. For hard drives and other physical media:

a. Obtain the physical media, and plug it in, using all safe handling procedures and write-blocking hardware.

b. If the total data is 100GB or less, simply drag items into the correct folder in 1received.

c. If the total data is more than 100GB:

i. To image the entire disk: use the CreateRemoteImage script to create a disk image on Karst.

ii. To image only the portion of the disk that contains data:

A) Overwrite zeros into all the blank space, and delete the zeros ***???(I forget this procedure)

B) Create the disk image ***???(I forget how)


3. For items on an IU server (Karst, CITO, BigRedII, etc.):

a. Work with donor and server administrators to gain login access to the server.

i. This access is necessary in order for BDPL staff to do the transfer.

ii. Server administrators can give BDPL a login and read-only privileges to specific folders on the server.

b. If the total data is 100GB or less, transfer to the proper folder in ~/Documents/1received/ using sftp or scp (using the "recursive" option "-r"). You may also use sshfs to mount the server and simply drag the items to the proper folder in ~/Documents/1received.

c. If the total data is more than 100GB, we will probably need to automate the transfer by writing a transfer script.


Step 6: Run BDPLinventory on the transferred items

See => BDPLinventory Script for Inventory and Checksums of Large Media


Step 7: Compare output and comparison files

1. Use BDPLinventory output and comparison files to verify transfer and edit information (info-template) file.

a. Make sure comparison files are identical

i. In the Terminal, navigate to the folder which contains the output and comparison files.

cd ~/Documents/1received/<BDPL Accession Number>/  (ex. cd ~/Documents/1received/UAC0920160003/)

ii. Use the diff command to see if two comparison files are identical.

diff CompareMMDD_HHMMSs.txt CompareMMDD_HHMMsS (ex. diff Compare1010_101955.txt Compare1010_104712.txt)

- If the files are identical, nothing will happen.

- If the files are different, the diff command will tell how they differ (unless there are too many differences)

b. What to do if the comparison files are not identical

- If the comparison files are not identical, the most likely difference is that files have been added or lost between the times when BDPLinventory was run.

i. Open two versions of the output .csv file (ex. UAC0920160003-inv1.csv and UAC0920160003-inv2.csv). Note the number of files contained in each. Are they different?

iii. Create two new dummy .csv files (Ctrl-N), and cut and paste the "last_modified" collumn, then the "file_name" collumn into each dummy .csv. Save these .csv files in ~/Desktop/Eraseme as "TempCSV.csv".

- DO NOT save any changes to the original output .csv files. Only save the new, dummy files (one dummy file for each output .csv).

iv. Sort each dummy file, according to the first collumn, which should be "last_modified":

A) (In LibreOffice) Select All (Ctrl-A)

B) Go to the menu item: Data => Sort...

C) On the first pane, "Sort Key 1" should be "last_modified" and the "Ascending" button should be selected. On the "Options" pane, select ONLY "Range contains collumn labels" and "Top to bottom (sort rows)", nothing else. Click "OK".

v. Now, scroll to the bottom of each dummy file, and you will most likely see which files have been added, and when. This assumes that the additional files have been added recently.

vi. If the change has not been caused by files being added or subtracted, you will need to figure out whether individual files have been modified, using the "last_modified" or the "last_accessed" dates and times.

c. Open the most recent output .csv file and find the "Date of Earliest Creation".  

- When opening the output .csv, the following options should be indicated: Character Set = Unicode UTF-8, Separator = comma, Text Delimiter = double quotes.

1. Create a new, dummy .csv (Ctrl-N), and cut and paste the contents of the "last_modified" collumn, then the "file_name" collumn into the dummy .csv.

*Note: If you run the BDPLinventory on a Windows machine, you can use the "c_datetime" collumn instead of "last_modified". This is because on Windows, the "c_time" is the the time the file was created, whereas on Linux the "c_time" is the last time the file was accessed.

2. Sort the two collumns in the dummy .csv, so that the earliest date is at the top of the collumn. Use the instructions in "b", just above, to do this, if need be.

=> This is your "Date of Earliest Creation" for the information template file.

Step 8: Package all files transferred from original media into a tar.gz file

???

Step 9: Finalize info template information, including generating MD5 checksum of tar file

???

Step 10: Tar up all files into final package

???

Step 11: Move to Documents/2readyforingest

???

Ingest


  • No labels