Skip to end of metadata
Go to start of metadata

Date

Attendees

Location

Wells Library

1320 E. 10th St.
Bloomington IN 47405
(see map)

All meetings except the MDPI tour are in Wells Library.

E171 (IDAH Conference Room): This room is in the East Tower, past the elevators and down the hall past the stairwell

E252: This room is in the East Tower on the second floor – if you take the stairs up, follow the hallway and turn right in the main room

W531: This room is in the West Tower on the fifth floor – take the elevators up and find the door marked W501, go straight to the back wall and turn left

W517 (only needed if W531 is busy for some reason) : This room is  in the West Tower on the fifth floor – take the elevators up and find the door marked W501, go straight to the back wall and turn right

Notes

Collaborative Notes Google Doc: https://docs.google.com/document/d/1cIkj-e7LZvHXQl9nYMJbCYkayUydHVDpGs5ftt0JT7E/edit?usp=sharing

http://helium.dlib.indiana.edu:8200/

http://hydradam.dlib.indiana.edu/

Agenda

September 14, 2016


2pm

Roadmap Overview : Where we are, what we need to get to a minimum viable product

Facilitator: Mike

Location: E171 (IDAH Conference Room)


Data model: spreadsheet work is done, now it’s just an issue of seeing it in action

 What does done mean? : spreadsheet was mapping data to properties; to what degree one entity fits into another is not fully handled yet, we do have something to work with

 The data model discussion; enough figured out now without speculating about future developments of Hydra

 Marking as finished, need to move on from here to figure out what’s next


Installed HydraDAM2 rails engine: Halfway through conversion stuff


Complex and Simplified Ingest: Complex: haven’t decided unification of two sides of a record


Display of configuration: Getting gem structure in place has taken a lot of time in past sprint, translation to gem from app will be the complication only

 Marking as done for September meeting, adding “Translate app functions to gem” to October


October 2016:

 Preservation metadata: adding new stories to this on the roadmap


Interact with found set based on preservation action for SDA stored file


November 2016:

User admin -- Daniel’s work with Hybox might help to conceptualize this - he hasn’t been working directly with admin sets but this is def an area we want to watch

 Avalon’s access control management might also be really helpful to solve all of these unknowns

 Jon: It’s a requirement for IU to be able to do this with LDAP groups, bc we want to use this information for a wide variety of apps; Drew: is there a resource we can use to test this out? Jon: there are standalone LDAP servers; Will: Good thing to ask to Avalon people tomorrow as well

 A lot of unknowns for Drew for November

 Cancan is known (IU uses this, Drew knows well), WebAC part is uncertain?

Drew: Better if we can push for it to be pushed down into a Curation Concerns level


December 2016:

 Ability to reorganize and interact with information based on queries

Drew: saved queries might be a good

What are those statistics?


Can we get here by December?

 Drew: aggressive at our current pace; biggest problems are getting answers to unknowns, getting answers to standing questions from Hydra community members

 Will: probably will be similar to how things have gone so far, a few things will get pushed into the next month or so


Do we need all of these things to have a production system in place?

 Jon: for us, having user and access control stuff in place is a minimum to being able to use it for production

 Will: if we are comfortable with the data model, we can start moving things into a production instance, while in the meantime refining user access (Karen agrees that GBH would like this)

 Jon: as long as we don’t have to move the file location, we can do this

Drew: Devops are absent from this; where does this thing live hasn’t been really discussed yet

Jon: We have to decide whether this is ESS or not; Will: this might be a test case for ESS

Karen: how does this relate to Hathitrust? Not related, but Jon mentioned that there has been discussion about a Hathitrust for audiovisual


2017:

Search for and update metadata: cross off 363 and 370

Karen: How would you deaccession? -- not sure how this would look, Will: suggests this should be a one-off; Jon: you don’t want to make it too easy

Jon: We should focus on one-way for the Avalon--HD2 connection

Exporting out to RDF and XML -- use cases: sending this to a vendor? MIke: HD1 did do this, but it had descriptive metadata; This doesn’t seem like it’s in the grant

3pm

Preservation Events: Implementing PREMIS within HD2

See comments for work to date: https://bugs.dlib.indiana.edu/browse/HDM-584

Facilitator: Julie

Location: E171 (IDAH Conference Room)

Goals:

  • Julie: Define preservation events to model in PREMIS

  • Julie: Decide how preservation events are tracked/stored, how preservation events are searched/browsed

  • Julie: Determine differences in tracking and searching for preservation events about files stored in Fedora vs asynchronous files external to Fedora

See https://bugs.dlib.indiana.edu/browse/HDM-584 

Notes:

http://id.loc.gov/vocabulary/preservation/eventType.html

http://id.loc.gov/ontologies/premis.html


Preservation events to model within HydraDAM2 functionality:

  • Fixity check

  • Ingestion


Events that happen outside of HD2 before item is ingested and should be recorded:

  • Message digest calculation

  • Validation

  • Virus check

  • Capture (maybe)

  • Creation [added by Julie after meeting]


How are events tracked/stored?

  • Fixity

    • Request goes out to storage proxy for checksum

    • Newly calculated checksum returned, is compared to “last good” checksum

    • Event is created, with date of fixity check, and true/false value of whether the newly calculated checksum matches the “last good” checksum.

    • Event is associated with FileSet.


How are events stored and browsed?

Also look at:

https://wiki.duraspace.org/display/FF/Design+-+Audit+Service 

https://wiki.duraspace.org/display/FF/Design+-+PREMIS+Event+Service

Drew’s drawing proposing one way to model preservation events on each file within a fileset. This follows the way we’re currently using FileSet objects, which is to say, we use a single FileSet for each derivative of a digital file (i.e. “preservation”, “production”, “access”), with designated properties within the file set that have special meaning, i.e. “original_file” corresponds to the main digital file, and “fits” would correspond to the FITS XML that describes “original_file”. In this model, we introduce 2 new models: “EventLog” and “Log”, and for every File, there is also an EventLog.

Photo on 9-14-16 at 4.01 PM.jpg

Idea from Brian W (described by Julie):

Regardless of how each preservation event is stored (as an object or within a log-type file), consider including the last outcome/details/date of the common event types (so the date of the last fixity check, for example) as a property [Julie thinks this could be on the File] for ease of searching and using. That way, the entirety of events for a file don’t have to be reviewed to find the things that commonly need to be asked for (what was the result of the last fixity check, what was the date of ingest, what was the date of creation, etc). These properties would be for internal use only so they don’t necessarily need to be from an existing ontology - we could deal with these as custom internal properties. The entire history of events could then be managed without needing to be touched as often. Brian W isn’t so sure about a single log file that is appended to when a preservation event occurs because if something goes wrong in that appending action, the whole file might be trashed.


Julie thinks the list of events above could have the latest event tracked using internal RDF properties on the File. Fixity check is the only event that will be recurring from that list. Ingestion and the pre-ingest events are all one-time events.


4pm

Code Architecture (Developers only)

Location: E171 (IDAH Conference Room)

Goals:

  • Templates and generated code -- keep it to a minimum. We all agree

  • What to keep in Hydradam core gem? (This is just suggestion)

    • Discoverable files? - Yes.

    • Interfacing with storage proxy? - No.

    • Preservation Event logs? - Not necessarily.

  • Testing

PO wireframe drawing

Location: E252


PO wireframe drawing

The Manage Collections/Units/Groups button (whatever we end up deciding to call it) is what would take you to the next screen below, where you can interact with collection information in terms of who has access and what their permissions are.

The Administrator or Collection Manager are the only ones who can add permissions and delete users. Their interfaces will look exactly the same except that the administrator can see all of the collections.

Clicking on a username on the Manage Collections Page will take you to a page that displays a log of all of that user’s actions



5pm

Break 


6pm
Dinner at the Irish Lion, 212 W. Kirkwood Ave. (see map)
reservation for 12 people under Dunn

September 15, 2016

9am

Avalon + HD2 Demos and Developer Sharing Time

2016-09-15 Meeting notes - Avalon & HydraDAM2 Developers' Meeting

Facilitator: Will

Location: W531

  • Can access MDPI items in Dark Avalon using barcode number: https://content.mdpi.iu.edu/object/40000000363772 (requires login but this is Iowa item in HD2)

  • Chris Colvard recommending REST API be defined for HD2 so that collection manager can search in Avalon using desc metadata, come up with a set of items, then take a bulk action or select an item to take a preservation action - send list of barcodes to HD2 - results of action in HD2 could send email with results or link to HD2 system to view results

PO powwow and general JIRA browsing/backlog grooming (POs + PIs)

Location: E252

Goals:

  • Backlog grooming

Notes:


358:

I need to organize and interact with objects based on preservation actions

 Need to break this down into two stories: design and implementation


502:

What’s happening with Daniel’s subtasks for testing, how will this story move forward?


458:

Do we need to add a done looks like? Is this just UI specific?


Need to write a new story about LDAP groups and how to provide access


281:

Need to write another story about inactivating

 Jon: maybe we need to manage this by groups

 Avalon uses roles - use this method instead

  With any role you can associate either an individual user or a group

  Make cohesive across HD2 and internal Avalon --ADS groups


327:

Have we defined how ingest errors are handled?


392:

LTO tape number, how does the system store LTO location information (is it metadata on the object, or is there proxy information to make that relationship?)

 Mike: Link to access system? Bc that’s where the LTO tape is requested


361:

If we want to track login information, we would need to add that into HD2 (not something we could do with Fedora)

 Login isn’t a priority ; could be good for aggregate statistics (so many uses, etc)

 

322:

We need to figure out whether to delete orphan filesets or have a way for the system to flag them so that someone can go in and figure out what to do with them


10am

Code Architecture (dev) and Wireframe (POs) Reportbacks 

Facilitator: Drew

Location: E252

Avalon+HD2= going to use Avalon’s use of LDAP

 Of interest for GBH too, for different departments -- Frontline, etc


Grant specifies -- things go into preservation system, and can push into access system

Will mentioned that one potential future development could be from Avalon, undertake Preservation Actions


Jon: first thing we need really is the ability to link the systems through the barcode; Will: they basically already have that, so we could create those links for HD2 and connect easily?


We definitely need to do something in terms of the connection with Avalon (NEH has asked specifically about this)


We should set up a regular meeting with Avalon team to work this out, might be able to incorporate with the monthly devcon meeting; Jon: wants direct insight into what’s happening as PI; Will and Maria will talk about what might work


 First priority is LDAP stuff

 

Backlog Grooming:

Testing strategy (HDM-502)

  • Testing in an engine is a learning experience for drew, so it’s not certain yet

  • Sticking to test-driven development when it doesn’t get in the way

  • Factories (test data is created, configurable, don’t have a lot of test data in the application) vs fixtures (data from system copied) - worked on factories, but bumping into issues with our xml and SIPs

    • Question to devs: keep going with less clean testing (test data part of the code base?)? How to balance getting testing in good order without over-focusing on it?

    • Coveralls -- issue of false negatives, what to do with the small amount the tool identifies as not being tested; Will suggested it’s useful

Ingest Error (HDM 257)

  • Duplicates error vs everything else on ingest:

    • Basic error (code failure): report error and move on, or code stops immediately

      • Randall: want to abort but also make sure there are no strange objects left over from that failure

      • Drew: as first iteration, just report any of these errors, log, move on

    • Policy error (duplicates): have to make configurable, not able to hard code existence in repository

      • Barcode comparison on ingest

        • Avalon’s REST API has very direct return codes for duplicates etc

        • Issue with basic log and continue: Inability to track state of process running is a challenge (for Brian Wheeler)?

Ingest issues: we need to have this conversation with Brian -- should Avalon people also be included to understand what they’re doing? Will also suggests POD work might factor in

11am
Break/Travel to CIB

11:30-1pm
Tour of MDPI Facilities (Mike Murazsko, Karen Cariani, Drew Myers, Randall Floyd, Daniel Pierce, Jon Dunn)

1pm
Lunch
Location: E252

2pm

Collection managers meeting: IU MDPI collection owners - workflows, digitized/born digital issues

Facilitator: Jon

Location: E252


Goals:

  • Technical metadata that cm’s want to search and facet on; tech md that might be useful to end users/researchers

  • Born digital audiovisual content at WGBH

  • User admin features - how would people need to be given access?


Notes:

  • Collection manager attendees: Alan Burdette (ATM), Mike Casey (MDPI), Carla Arton (IULMIA), Jon Cameron (MCO/Avalon)

  • showing/demoing HD2 is a possibility for today’s meeting

  • Old Sound Directions content is part of backlog along with MDPI content that needs to go into this system

  • Does features roadmap need to take into account IU/WGBH backlogs? Backlog contains different formats and files in combination (particularly at IU outside of MDPI)

  • Metadata about original physical item will be important, particularly as time goes on, to end users (collection managers and researchers)

  • Scope of HydraDAM - right now units have a tool to download files - is that supposed to be part of HD2? Yes, that will be incorporated

  • Images of original physical formats would be even better

  • Demo of HD2

    • Where would original physical format info show? At work level

    • Does information in HD2 talk to original source for information? Open question, different for every unit where metadata originates and flows to, tends to be difficult to manage; technical md doesn’t tend to have this type of problem since it doesn’t change but descriptive metadata does have this problem

  • Info about original physical format - Alan, not sure if that would be good to always have or not, some researchers would be interested but many would not; Mike thinks more people are going to be interested in this, especially as time goes on, maybe this info is in a secondary screen in Avalon, but sees it as important, would select information to show instead of showing everything but if it’s possible to see everything or get report, then show it

    • Like going to IUCAT and seeing Librarian View; that’s not necessarily a view that includes more metadata but it’s a different view, there are some fields that don’t show in any other view

  • Carla, if our users new we had that data about the technology used to digitize, they would want to see it

  • Mike: Descriptive metadata in pres repo - 50 years from now what is vision? Pres repo is similar to catalog and access repo, they are not permanent and will have to be carried forward to future tech; will need external storage (beyond IU) and sufficient info for how that works

  • Alan: Role of Avalon for long-term maintenance of descriptive metadata - is that system if it’s the only place where desc md exists, a kind of preserved place for that data? Iucat, archives online, avalon, all have primary repo duties for desc and structural md

  • ATM structural metadata is much richer than what Avalon can currently support - when that structure is described electronically, how is that preserved? Does Avalon need to change to support that richness and do changes made in Avalon (and iucat, archives online) get reflected in pres repo

  • Need to figure out when and how desc md get updated in pres repo

  • Use case

    • A researcher comes to coll manager and they need to find it

    • CM thinks they have better copy of something that was already digitized - need to find it and see how it was digitized

    • Film - pres is original, mezz would have color correction; would like to be able to find that information about mezz files to see how it was color corrected

    • Mike - need to know digitization techniques used for set of items

    • Playback speed could be important for faceting

  • Access controls to master files and tech/prov md

    • Roles at unit level - full rights (including delete/deaccession?), view md but no download, view/download but no edit

    • Is unit level sufficient or does it need to be subset by collections? Carla thinks unit level is good but role differentiation is definitely needed; Alan thinks there might be weird cases but probably nothing that can be programmed for (collection that donor says should only be accessible by men, for example)

    • Traditional knowledge labelling exists (Naz knows about it) but they aren’t access controls

Developers powwow

Location: W531

Goals: Where do we start ingestion with SDA?

Notes: 

  • The State machine decides to send a MDPI object to Avalon and could also send to HD2

  • The metadata is a bagit of xml files (ffprobe, mods, fits, etc) which is sent to HD2

  • Ingestion is similar to what we do today.

    • With an optional FID.

    • Without a FID, do an create.

    • We will use the Fedora PID afterwards.

    • If FID is present then do an update not an insert.

  • Ingestion by HD2, we need to send a message back that ingestion was successful

    • send back the PID in case the calling routine wants to know.

  • Partial ingestion - how to handle failure?

    • If PID exists then undo all the parts of object, fedora, solr, etc.

    • A flag that indicates that the object is solid and was created correctly.

    • If it has this flag do not enable batch delete.

    • Define state of failure - what step did I die on?


(NOTES FROM DREW BELOW)

Notes from HydraDAM developer powow

2pm 9/15/2016


* Ingestion

 * SIP structure

   - concluded that HydraDAM should be expectant of a "fairly specific" Bagit structure

   - force implementers to structure incomding SIPs into the acceptable "fairly specific" Bagit structure

   - Bag must contain a manifest indicating where technical is, what types (ffprobe, fits), or if it exists at all.


 * Identification of existing records - avoiding dupes.

   - instead of writing custom logic to combine one or more fields to identify uniqueness, have HydraDAM look for a specific Fedora ID.

   - If FID found, update. If not, insert.

   - this should be a good way to identify dupes in a performant way


 * Allowing a "REST-ish" api endpoint for ingestions better defines the separation of where the custom logic for mapping objects to be "ingest-ready" should live... which is "not in HydraDAM".


 * Failures

   - Go ahead and do a full rollback at the time of ingestion

   - Identify failed items in the log by filename (default).

     - possibly allow config to override using filename as the identification string in the log
 

3pm

Data Model Discussion and PCDM

Facilitator: Heidi

Location: E252

Goals:

  • Do we all agree on storing MODS and POD xml within their own filesets?

  • Where should PREMIS events go in the data model?

  • What is our strategy for getting our use case out there for other Hydra IGs and WGs to make sure that PCDM continues to support it?

  • Preservation issue if there’s time: generating a checksum on xml files, without actually regularly running fixity checks on those files or storing preservation actions on those files - whether to store regular fixity checks


Notes:

  • MODS and POD XML

    • Storing them as floating objects doesn’t conform to HydraWorks

    • They should be in their own Fileset

  • PREMIS

    • Into a log file?

    • Timestamps needed on work/fileset solr record. Ex: Need to be able to query all objects that have not had fixity check in 6 months.

    • Status of Fedora’s work in this area? https://wiki.duraspace.org/display/FF/Design+-+PREMIS+Event+Service 

      • Uses hasEvent relation between objects

      • Fedora actions as premis events. Ex: nodeAdded. Could be noisy from HDM viewpoint.

    • Hydra Premis Interest/Working Group has low activity.

    • UI design from premis events needed.

      • Discovery interface for events.

  • Use Case Strategy

  • Preservation of xml files


Next steps: Reach out to Andrew Woods or David Wilcox, or Eric James, Ben Armintor to discuss how to implement PREMIS

 Set up an unconference session around PREMIS at HydraConnect -- potentially Julie

4pm

Recap of Decisions and Action Items

Facilitator: Karen

Location: E252

Goals:

  • Summarize last two days

  • List action items

  • Anything that hasn’t been discussed (hackathon)

Notes:


Summary:

Refined Features Roadmap -- no more exporting metadata to xml

PREMIS discussions -- two areas: before ingest, within HD2; not focusing on anything that happens after ingest outside of HD2; adhering to community-developed ideas where possible


Avalon/HD2 meeting -- integration at the barcode level? (also supports WGBH’s need) Selection of items in Avalon, then perform preservation actions within HD2 (passing a list of IDs?)


Action Items:

  • Start thinking about deployment (if prototype, creating a production instance Fedora and solr) -- hard December

  • Julie will model PREMIS events -- fixity and ingestion

  • HydraConnect unconference session for storing PREMIS/preservation events

  • Have ongoing discussion around testing

    • Code reviews and team checks on testing to make sure that Drew doesn’t wander too far down a path

    • Adam Wead workshop at HydraConnect -- Randall is signed up, Drew might

  • Define use case around Avalon and HydraDAM2 integration (how could this potentially be used?); stories to support this use case

  • Develop an approach to gather use cases from outside collection managers -- propose things for people to react to as opposed to open questions

  • Reach out to someone to discuss PREMIS (Andrew Woods, Ben Armintor, Eric James, or David Wilcox)

  • Jon will sign up to a lightning talk on HydraDAM2 at HydraConnect

  • Aim for Hydra Partners call presentation in December meeting

  • HydraDAM2 poster for HydraConnect: Heidi will edit iPRES poster (remove logo), will send to Karen

  • Come up with theme for hackathon

  • Determine hackathon budget and look into possible events to connect a hackathon

  • Regularly hold developer forum -- choose topic


Anything not discussed:

  • Organizing hackathon

    • We have money to help fund people’s travel -- $7k

      • How many?

      • Where?

      • How long?

    • Do we want to combine this with anything else? If we attach it to a meeting, we can say that we’ll fund staying an extra day or something

    • Preservation actions, creating engine for preservation concerns, asynchronous storage

    • Sometime in 2017 -- if we’re already done with the basics, what can we look at for an extension of features or whatever?

    • If standalone meeting: 1500 apiece if a couple days; 5-10 people?


5pm
Dinner on our own

Information

Wifi Access: The IMU hotel should provide IU guest logins on checkin; if there are problems with this, guest users should be able to access wifi using the open AT&T Network

Notes: There is a shared google doc for collaborative notetaking (linked above), after the meetings the notes will be added to the Wiki for extended access

Contact Information: Will Cowan 812-856-7815; Heidi Dowding 812-856-5295

Goals

  • Heidi: Complete wireframes of how the final release will look
  • Heidi: Establish how to implement PREMIS for preservation metadata within RDF; ensure that the final data model is what we want it to be, especially in regards to PREMIS events
  • Will: Understand current code architecture
  • Will: Determine if there are any areas of intersection for Avalon and HydraDAM2 for video playback, descriptive metadata and Access Control.

Action Items

  • Start thinking about deployment (if prototype, creating a production instance Fedora and solr) -- hard December

  • Julie will model PREMIS events -- fixity and ingestion
  • HydraConnect unconference session for storing PREMIS/preservation events

  • Have ongoing discussion around testing and regular code checks

  • Adam Wead workshop at HydraConnect -- Randall is signed up, Drew might

  • Define use case around Avalon and HydraDAM2 integration (how could this potentially be used?); stories to support this use case

  • Develop an approach to gather use cases from outside collection managers -- propose things for people to react to as opposed to open questions

  • Reach out to someone to discuss PREMIS (Andrew Woods, Ben Armintor, Eric James, or David Wilcox)

  • Jon will sign up to a lightning talk on HydraDAM2 at HydraConnect

  • Aim for Hydra Partners call presentation in December meeting

  • HydraDAM2 poster for HydraConnect: Heidi will edit iPRES poster, will send to Karen
  • Come up with theme for hackathon

  • Determine hackathon budget and look into possible events to connect a hackathon
  • Regularly hold developer forum -- choose topic