Date
Attendees
- William G. Cowan
- Michael Muraszko
- Amol Khedkar
- Andrew Myers
- Karen Cariani
- Randall Floyd
- Nianli Ma
- Daniel Pierce
- Brian Wheeler (for select sessions)
- Julie Hardesty (for select sessions?)
Location
Wells Library
1320 E. 10th St.
Bloomington IN 47405
(see map)
All meetings except the MDPI tour are in Wells Library.
Notes
Collaborative Notes Google Doc: https://docs.google.com/document/d/1cIkj-e7LZvHXQl9nYMJbCYkayUydHVDpGs5ftt0JT7E/edit?usp=sharing
HydraDAM2 Links
http://helium.dlib.indiana.edu:8200/
http://hydradam.dlib.indiana.edu/
Agenda
Data model: spreadsheet work is done, now it’s just an issue of seeing it in action
What does done mean? : spreadsheet was mapping data to properties; to what degree one entity fits into another is not fully handled yet, we do have something to work with
The data model discussion; enough figured out now without speculating about future developments of Hydra
Marking as finished, need to move on from here to figure out what’s next
Installed HydraDAM2 rails engine: Halfway through conversion stuff
Complex and Simplified Ingest: Complex: haven’t decided unification of two sides of a record
Display of configuration: Getting gem structure in place has taken a lot of time in past sprint, translation to gem from app will be the complication only
Marking as done for September meeting, adding “Translate app functions to gem” to October
October 2016:
Preservation metadata: adding new stories to this on the roadmap
Interact with found set based on preservation action for SDA stored file
November 2016:
User admin -- Daniel’s work with Hybox might help to conceptualize this - he hasn’t been working directly with admin sets but this is def an area we want to watch
Avalon’s access control management might also be really helpful to solve all of these unknowns
Jon: It’s a requirement for IU to be able to do this with LDAP groups, bc we want to use this information for a wide variety of apps; Drew: is there a resource we can use to test this out? Jon: there are standalone LDAP servers; Will: Good thing to ask to Avalon people tomorrow as well
A lot of unknowns for Drew for November
Cancan is known (IU uses this, Drew knows well), WebAC part is uncertain?
Drew: Better if we can push for it to be pushed down into a Curation Concerns level
December 2016:
Ability to reorganize and interact with information based on queries
Drew: saved queries might be a good
What are those statistics?
Can we get here by December?
Drew: aggressive at our current pace; biggest problems are getting answers to unknowns, getting answers to standing questions from Hydra community members
Will: probably will be similar to how things have gone so far, a few things will get pushed into the next month or so
Do we need all of these things to have a production system in place?
Jon: for us, having user and access control stuff in place is a minimum to being able to use it for production
Will: if we are comfortable with the data model, we can start moving things into a production instance, while in the meantime refining user access (Karen agrees that GBH would like this)
Jon: as long as we don’t have to move the file location, we can do this
Drew: Devops are absent from this; where does this thing live hasn’t been really discussed yet
Jon: We have to decide whether this is ESS or not; Will: this might be a test case for ESS
Karen: how does this relate to Hathitrust? Not related, but Jon mentioned that there has been discussion about a Hathitrust for audiovisual
2017:
Search for and update metadata: cross off 363 and 370
Karen: How would you deaccession? -- not sure how this would look, Will: suggests this should be a one-off; Jon: you don’t want to make it too easy
Jon: We should focus on one-way for the Avalon--HD2 connection
Goals:
Julie: Define preservation events to model in PREMIS
Julie: Decide how preservation events are tracked/stored, how preservation events are searched/browsed
Julie: Determine differences in tracking and searching for preservation events about files stored in Fedora vs asynchronous files external to Fedora
See https://bugs.dlib.indiana.edu/browse/HDM-584
Notes:
http://id.loc.gov/vocabulary/preservation/eventType.html
http://id.loc.gov/ontologies/premis.html
Preservation events to model within HydraDAM2 functionality:
Fixity check
Ingestion
Events that happen outside of HD2 before item is ingested and should be recorded:
Message digest calculation
Validation
Virus check
Capture (maybe)
Creation [added by Julie after meeting]
How are events tracked/stored?
Fixity
Request goes out to storage proxy for checksum
Newly calculated checksum returned, is compared to “last good” checksum
Event is created, with date of fixity check, and true/false value of whether the newly calculated checksum matches the “last good” checksum.
Event is associated with FileSet.
How are events stored and browsed?
Also look at:
https://wiki.duraspace.org/display/FF/Design+-+Audit+Service
https://wiki.duraspace.org/display/FF/Design+-+PREMIS+Event+Service
Drew’s drawing proposing one way to model preservation events on each file within a fileset. This follows the way we’re currently using FileSet objects, which is to say, we use a single FileSet for each derivative of a digital file (i.e. “preservation”, “production”, “access”), with designated properties within the file set that have special meaning, i.e. “original_file” corresponds to the main digital file, and “fits” would correspond to the FITS XML that describes “original_file”. In this model, we introduce 2 new models: “EventLog” and “Log”, and for every File, there is also an EventLog.
Idea from Brian W (described by Julie):
Regardless of how each preservation event is stored (as an object or within a log-type file), consider including the last outcome/details/date of the common event types (so the date of the last fixity check, for example) as a property [Julie thinks this could be on the File] for ease of searching and using. That way, the entirety of events for a file don’t have to be reviewed to find the things that commonly need to be asked for (what was the result of the last fixity check, what was the date of ingest, what was the date of creation, etc). These properties would be for internal use only so they don’t necessarily need to be from an existing ontology - we could deal with these as custom internal properties. The entire history of events could then be managed without needing to be touched as often. Brian W isn’t so sure about a single log file that is appended to when a preservation event occurs because if something goes wrong in that appending action, the whole file might be trashed.
Julie thinks the list of events above could have the latest event tracked using internal RDF properties on the File. Fixity check is the only event that will be recurring from that list. Ingestion and the pre-ingest events are all one-time events.
Goals:
Templates and generated code -- keep it to a minimum. We all agree
What to keep in Hydradam core gem? (This is just suggestion)
Discoverable files? - Yes.
Interfacing with storage proxy? - No.
Preservation Event logs? - Not necessarily.
Testing
PO wireframe drawing
The Manage Collections/Units/Groups button (whatever we end up deciding to call it) is what would take you to the next screen below, where you can interact with collection information in terms of who has access and what their permissions are.
The Administrator or Collection Manager are the only ones who can add permissions and delete users. Their interfaces will look exactly the same except that the administrator can see all of the collections.
Clicking on a username on the Manage Collections Page will take you to a page that displays a log of all of that user’s actions
Can access MDPI items in Dark Avalon using barcode number: https://content.mdpi.iu.edu/object/40000000363772 (requires login but this is Iowa item in HD2)
- Chris Colvard recommending REST API be defined for HD2 so that collection manager can search in Avalon using desc metadata, come up with a set of items, then take a bulk action or select an item to take a preservation action - send list of barcodes to HD2 - results of action in HD2 could send email with results or link to HD2 system to view results
PO powwow and general JIRA browsing/backlog grooming (POs + PIs)
Location: E252
Goals:
Backlog grooming
Notes:
358:
I need to organize and interact with objects based on preservation actions
Need to break this down into two stories: design and implementation
502:
What’s happening with Daniel’s subtasks for testing, how will this story move forward?
458:
Do we need to add a done looks like? Is this just UI specific?
Need to write a new story about LDAP groups and how to provide access
281:
Need to write another story about inactivating
Jon: maybe we need to manage this by groups
Avalon uses roles - use this method instead
With any role you can associate either an individual user or a group
Make cohesive across HD2 and internal Avalon --ADS groups
327:
Have we defined how ingest errors are handled?
392:
LTO tape number, how does the system store LTO location information (is it metadata on the object, or is there proxy information to make that relationship?)
Mike: Link to access system? Bc that’s where the LTO tape is requested
361:
If we want to track login information, we would need to add that into HD2 (not something we could do with Fedora)
Login isn’t a priority ; could be good for aggregate statistics (so many uses, etc)
322:
We need to figure out whether to delete orphan filesets or have a way for the system to flag them so that someone can go in and figure out what to do with them
Avalon+HD2= going to use Avalon’s use of LDAP
Of interest for GBH too, for different departments -- Frontline, etc
Grant specifies -- things go into preservation system, and can push into access system
Will mentioned that one potential future development could be from Avalon, undertake Preservation Actions
Jon: first thing we need really is the ability to link the systems through the barcode; Will: they basically already have that, so we could create those links for HD2 and connect easily?
We definitely need to do something in terms of the connection with Avalon (NEH has asked specifically about this)
We should set up a regular meeting with Avalon team to work this out, might be able to incorporate with the monthly devcon meeting; Jon: wants direct insight into what’s happening as PI; Will and Maria will talk about what might work
First priority is LDAP stuff
Backlog Grooming:
Testing strategy (HDM-502)
Testing in an engine is a learning experience for drew, so it’s not certain yet
Sticking to test-driven development when it doesn’t get in the way
Factories (test data is created, configurable, don’t have a lot of test data in the application) vs fixtures (data from system copied) - worked on factories, but bumping into issues with our xml and SIPs
Question to devs: keep going with less clean testing (test data part of the code base?)? How to balance getting testing in good order without over-focusing on it?
Coveralls -- issue of false negatives, what to do with the small amount the tool identifies as not being tested; Will suggested it’s useful
Ingest Error (HDM 257)
Duplicates error vs everything else on ingest:
Basic error (code failure): report error and move on, or code stops immediately
Randall: want to abort but also make sure there are no strange objects left over from that failure
Drew: as first iteration, just report any of these errors, log, move on
Policy error (duplicates): have to make configurable, not able to hard code existence in repository
Barcode comparison on ingest
Avalon’s REST API has very direct return codes for duplicates etc
Issue with basic log and continue: Inability to track state of process running is a challenge (for Brian Wheeler)?
Goals:
Technical metadata that cm’s want to search and facet on; tech md that might be useful to end users/researchers
Born digital audiovisual content at WGBH
User admin features - how would people need to be given access?
Notes:
Collection manager attendees: Alan Burdette (ATM), Mike Casey (MDPI), Carla Arton (IULMIA), Jon Cameron (MCO/Avalon)
showing/demoing HD2 is a possibility for today’s meeting
Old Sound Directions content is part of backlog along with MDPI content that needs to go into this system
Does features roadmap need to take into account IU/WGBH backlogs? Backlog contains different formats and files in combination (particularly at IU outside of MDPI)
Metadata about original physical item will be important, particularly as time goes on, to end users (collection managers and researchers)
Scope of HydraDAM - right now units have a tool to download files - is that supposed to be part of HD2? Yes, that will be incorporated
Images of original physical formats would be even better
Demo of HD2
Where would original physical format info show? At work level
Does information in HD2 talk to original source for information? Open question, different for every unit where metadata originates and flows to, tends to be difficult to manage; technical md doesn’t tend to have this type of problem since it doesn’t change but descriptive metadata does have this problem
Info about original physical format - Alan, not sure if that would be good to always have or not, some researchers would be interested but many would not; Mike thinks more people are going to be interested in this, especially as time goes on, maybe this info is in a secondary screen in Avalon, but sees it as important, would select information to show instead of showing everything but if it’s possible to see everything or get report, then show it
Like going to IUCAT and seeing Librarian View; that’s not necessarily a view that includes more metadata but it’s a different view, there are some fields that don’t show in any other view
Carla, if our users new we had that data about the technology used to digitize, they would want to see it
Mike: Descriptive metadata in pres repo - 50 years from now what is vision? Pres repo is similar to catalog and access repo, they are not permanent and will have to be carried forward to future tech; will need external storage (beyond IU) and sufficient info for how that works
Alan: Role of Avalon for long-term maintenance of descriptive metadata - is that system if it’s the only place where desc md exists, a kind of preserved place for that data? Iucat, archives online, avalon, all have primary repo duties for desc and structural md
ATM structural metadata is much richer than what Avalon can currently support - when that structure is described electronically, how is that preserved? Does Avalon need to change to support that richness and do changes made in Avalon (and iucat, archives online) get reflected in pres repo
Need to figure out when and how desc md get updated in pres repo
Use case
A researcher comes to coll manager and they need to find it
CM thinks they have better copy of something that was already digitized - need to find it and see how it was digitized
Film - pres is original, mezz would have color correction; would like to be able to find that information about mezz files to see how it was color corrected
Mike - need to know digitization techniques used for set of items
Playback speed could be important for faceting
Access controls to master files and tech/prov md
Roles at unit level - full rights (including delete/deaccession?), view md but no download, view/download but no edit
Is unit level sufficient or does it need to be subset by collections? Carla thinks unit level is good but role differentiation is definitely needed; Alan thinks there might be weird cases but probably nothing that can be programmed for (collection that donor says should only be accessible by men, for example)
- Traditional knowledge labelling exists (Naz knows about it) but they aren’t access controls
Goals: Where do we start ingestion with SDA?
Notes:
The State machine decides to send a MDPI object to Avalon and could also send to HD2
The metadata is a bagit of xml files (ffprobe, mods, fits, etc) which is sent to HD2
Ingestion is similar to what we do today.
With an optional FID.
Without a FID, do an create.
We will use the Fedora PID afterwards.
If FID is present then do an update not an insert.
Ingestion by HD2, we need to send a message back that ingestion was successful
send back the PID in case the calling routine wants to know.
Partial ingestion - how to handle failure?
If PID exists then undo all the parts of object, fedora, solr, etc.
A flag that indicates that the object is solid and was created correctly.
If it has this flag do not enable batch delete.
Define state of failure - what step did I die on?
(NOTES FROM DREW BELOW)
Notes from HydraDAM developer powow
2pm 9/15/2016
* Ingestion
* SIP structure
- concluded that HydraDAM should be expectant of a "fairly specific" Bagit structure
- force implementers to structure incomding SIPs into the acceptable "fairly specific" Bagit structure
- Bag must contain a manifest indicating where technical is, what types (ffprobe, fits), or if it exists at all.
* Identification of existing records - avoiding dupes.
- instead of writing custom logic to combine one or more fields to identify uniqueness, have HydraDAM look for a specific Fedora ID.
- If FID found, update. If not, insert.
- this should be a good way to identify dupes in a performant way
* Allowing a "REST-ish" api endpoint for ingestions better defines the separation of where the custom logic for mapping objects to be "ingest-ready" should live... which is "not in HydraDAM".
* Failures
- Go ahead and do a full rollback at the time of ingestion
- Identify failed items in the log by filename (default).
Goals:
Do we all agree on storing MODS and POD xml within their own filesets?
Where should PREMIS events go in the data model?
What is our strategy for getting our use case out there for other Hydra IGs and WGs to make sure that PCDM continues to support it?
Preservation issue if there’s time: generating a checksum on xml files, without actually regularly running fixity checks on those files or storing preservation actions on those files - whether to store regular fixity checks
Notes:
MODS and POD XML
Storing them as floating objects doesn’t conform to HydraWorks
They should be in their own Fileset
PREMIS
Into a log file?
Timestamps needed on work/fileset solr record. Ex: Need to be able to query all objects that have not had fixity check in 6 months.
Status of Fedora’s work in this area? https://wiki.duraspace.org/display/FF/Design+-+PREMIS+Event+Service
Uses hasEvent relation between objects
Fedora actions as premis events. Ex: nodeAdded. Could be noisy from HDM viewpoint.
Hydra Premis Interest/Working Group has low activity.
UI design from premis events needed.
Discovery interface for events.
Use Case Strategy
PCDM Working Group, Julie and Andrew are on it.
PCDM Tech calls - https://github.com/duraspace/pcdm/wiki/September-1,-2016
Preservation of xml files
Next steps: Reach out to Andrew Woods or David Wilcox, or Eric James, Ben Armintor to discuss how to implement PREMIS
Goals:
Summarize last two days
List action items
Anything that hasn’t been discussed (hackathon)
Notes:
Summary:
Refined Features Roadmap -- no more exporting metadata to xml
PREMIS discussions -- two areas: before ingest, within HD2; not focusing on anything that happens after ingest outside of HD2; adhering to community-developed ideas where possible
Avalon/HD2 meeting -- integration at the barcode level? (also supports WGBH’s need) Selection of items in Avalon, then perform preservation actions within HD2 (passing a list of IDs?)
Action Items:
Start thinking about deployment (if prototype, creating a production instance Fedora and solr) -- hard December
Julie will model PREMIS events -- fixity and ingestion
HydraConnect unconference session for storing PREMIS/preservation events
Have ongoing discussion around testing
Code reviews and team checks on testing to make sure that Drew doesn’t wander too far down a path
Adam Wead workshop at HydraConnect -- Randall is signed up, Drew might
Define use case around Avalon and HydraDAM2 integration (how could this potentially be used?); stories to support this use case
Develop an approach to gather use cases from outside collection managers -- propose things for people to react to as opposed to open questions
Reach out to someone to discuss PREMIS (Andrew Woods, Ben Armintor, Eric James, or David Wilcox)
Jon will sign up to a lightning talk on HydraDAM2 at HydraConnect
Aim for Hydra Partners call presentation in December meeting
HydraDAM2 poster for HydraConnect: Heidi will edit iPRES poster (remove logo), will send to Karen
Come up with theme for hackathon
Determine hackathon budget and look into possible events to connect a hackathon
Regularly hold developer forum -- choose topic
Anything not discussed:
Organizing hackathon
We have money to help fund people’s travel -- $7k
How many?
Where?
How long?
Do we want to combine this with anything else? If we attach it to a meeting, we can say that we’ll fund staying an extra day or something
Preservation actions, creating engine for preservation concerns, asynchronous storage
Sometime in 2017 -- if we’re already done with the basics, what can we look at for an extension of features or whatever?
- If standalone meeting: 1500 apiece if a couple days; 5-10 people?
Information
Wifi Access: The IMU hotel should provide IU guest logins on checkin; if there are problems with this, guest users should be able to access wifi using the open AT&T Network
Notes: There is a shared google doc for collaborative notetaking (linked above), after the meetings the notes will be added to the Wiki for extended access
Contact Information: Will Cowan 812-856-7815; Heidi Dowding 812-856-5295
Goals
- Heidi: Complete wireframes of how the final release will look
- Heidi: Establish how to implement PREMIS for preservation metadata within RDF; ensure that the final data model is what we want it to be, especially in regards to PREMIS events
- Will: Understand current code architecture
- Will: Determine if there are any areas of intersection for Avalon and HydraDAM2 for video playback, descriptive metadata and Access Control.
Action Items
Start thinking about deployment (if prototype, creating a production instance Fedora and solr) -- hard December
- Julie will model PREMIS events -- fixity and ingestion
HydraConnect unconference session for storing PREMIS/preservation events
Have ongoing discussion around testing and regular code checks
Adam Wead workshop at HydraConnect -- Randall is signed up, Drew might
Define use case around Avalon and HydraDAM2 integration (how could this potentially be used?); stories to support this use case
Develop an approach to gather use cases from outside collection managers -- propose things for people to react to as opposed to open questions
Reach out to someone to discuss PREMIS (Andrew Woods, Ben Armintor, Eric James, or David Wilcox)
Jon will sign up to a lightning talk on HydraDAM2 at HydraConnect
Aim for Hydra Partners call presentation in December meeting
- HydraDAM2 poster for HydraConnect: Heidi will edit iPRES poster, will send to Karen
Come up with theme for hackathon
- Determine hackathon budget and look into possible events to connect a hackathon
- Regularly hold developer forum -- choose topic