You are viewing an old version of this page. View the current version.
This is an attempt to decouple the ingest processes into something generic that can be abstracted into a gem and easily overridden by institutions who do not want to use the default behaviors. A first pass includes adding hooks for different events, extending the initializer, and writing an abstract ingest handler that can proxy for requests through both HTTP and batch workflows.
An initializer underneath /config is used to set up the list of steps along with define any event hooks needed by the application. A default implementation might look something like the following code snippet.
At various points in the workflow custom handlers can be injected to deal with events. One key event that can be trapped is the generation of derivatives which is asynchronous compared to the rest of the ingest steps.
Before_ingest / After_ingest
Events can be configured both before and after the ingest process begins. The before hook is used to initialize environment variables that might be needed and called just before the object is originally created. After_ingest is invoked when the workflow process terminates. This may be naturally, when an exception is thrown, or if the workflow is canceled. The intent is to act as both a safety valve for cleaning up state as well as any additional handling that needs to take place within the application.
Before_step / After_step / Around_step
These three hooks are for use around specific steps to take care of things that are not encapsulated within the step definition. One example is noted above - an after_step is used to begin the file conversion process. around_step is configured by default to trap exceptions and log them. Additionally it can be used for performance tuning, logging, or whatever is needed to manage state during the ingest process.
Before_file_conversion / During_file_conversion / After_file_conversion
To prepare files for conversion use the before_file_conversion. Things that you may want to do here are set up workflow properties for the conversion pipeline, create checksums, or validate content. It is assumed that legal file formats will be vetted by the file_upload step.
during_file_conversion and after_file_conversion are intended to track the state of the conversion process. Which one gets called is determined by a callback handler which queries the external conversion tool. If it reports that the file is still being converted during_file_conversion will be invoked. If the handler reports that is complete then after_file_conversion will be called. A default application supports three behaviours for this case
- Delete the master files from the system and do not keep a back up copy
- Move the master files to spinning disk so they can be archived
- Retain the master files for future reconversion in case there was a problem with the initial derivative creation process
(other extra properties)
- No labels