Users running Avalon Media System for some length of time might notice an increase in the disk space that Matterhorn consumes. Matterhorn is the transcoding workflow engine used by Avalon, and the loss of disk space is a result of it holding onto temporary workflow artifacts indefinitely—even when they are no longer needed. This is especially true for Avalon versions through 3.2. In release 3.3, changes were made to reduce this problem.
Although the need has been reduced due to workflow changes, there still may be times when it is desirable to clean up Matterhorn's work directories. The script below can be run to accomplish this task. It performs three main tasks:
- It removes temporary artifacts related to any workflow that was successfully completed by the previous day.
- It removes temporary artifacts related to any workflow that Matterhorn no longer knows about.
- It removes temporary artifacts that were saved for failed workflows.
This script should be edited as necessary for individual installations.
Deleting the workflow may leave job in a bad state
2016-10-19 12:59:00 WARN (WorkflowServiceImpl:1683) - Exception while accepting job Job {id:3502, version:25}
org.opencastproject.util.NotFoundException: Workflow '3494' has been deleted
at org.opencastproject.workflow.impl.WorkflowServiceImpl.getWorkflowById(WorkflowServiceImpl.java:480)
at org.opencastproject.workflow.impl.WorkflowServiceImpl.process(WorkflowServiceImpl.java:1659)
at org.opencastproject.workflow.impl.WorkflowServiceImpl$JobRunner.call(WorkflowServiceImpl.java:2048)
at org.opencastproject.workflow.impl.WorkflowServiceImpl$JobRunner.call(WorkflowServiceImpl.java:2014)
at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
at java.util.concurrent.FutureTask.run(FutureTask.java:138)
at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
at java.lang.Thread.run(Thread.java:662)
2016-10-19 12:59:00 WARN (WorkflowServiceImpl:1689) - Unable to parse workflow instance
org.opencastproject.util.NotFoundException: Workflow '3494' has been deleted
at org.opencastproject.workflow.impl.WorkflowServiceImpl.getWorkflowById(WorkflowServiceImpl.java:480)
at org.opencastproject.workflow.impl.WorkflowServiceImpl.process(WorkflowServiceImpl.java:1659)
at org.opencastproject.workflow.impl.WorkflowServiceImpl$JobRunner.call(WorkflowServiceImpl.java:2048)
at org.opencastproject.workflow.impl.WorkflowServiceImpl$JobRunner.call(WorkflowServiceImpl.java:2014)
at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
at java.util.concurrent.FutureTask.run(FutureTask.java:138)
at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
at java.lang.Thread.run(Thread.java:662)
#!/bin/bash USERNAME='matterhorn_system_account' PASSWD='CHANGE_ME' MATTERHORN_HOME='/usr/local/matterhorn/work/content' YESTERDAY=`date --date="yesterday" -u +"%Y-%m-%dT00:00:00Z"` cleanupDir() { mpdir=$1 for mp in `ls ${mpdir}`; do #Delete Mediapackages that are done processing curl --digest -u "${USERNAME}:${PASSWD}" -H "X-Requested-Auth: Digest" -H "X-Opencast-Matterhorn-Authorization: true" "http://localhost:18080/workflow/instances.json?state=SUCCEEDED%2C%20STOPPED%2C%20SKIPPED%2C%20FAILED&mp=${mp}&todate=${YESTERDAY}" 2>/dev/null | grep -qci -e '"totalCount":"1"' && echo "Deleting ${mpdir}/${mp}" && rm -r ${mpdir}/${mp} #Delete Mediapackages that Matterhorn doesn't know about anymore curl --digest -u "${USERNAME}:${PASSWD}" -H "X-Requested-Auth: Digest" -H "X-Opencast-Matterhorn-Authorization: true" "http://localhost:18080/workflow/instances.json?mp=${mp}" 2>/dev/null | grep -qci -e '"totalCount":"0"' && echo "Deleting ${mpdir}/${mp}" && rm -r ${mpdir}/${mp} done } #Delete Mediapackages in three possible locations cleanupDir "${MATTERHORN_HOME}/files/mediapackage" cleanupDir "${MATTERHORN_HOME}/workspace/mediapackage" cleanupDir "${MATTERHORN_HOME}/archive-temp" #Remove zips from failed workflows rm ${MATTERHORN_HOME}/files/collection/failed.zips/* rm ${MATTERHORN_HOME}/workspace/collection/failed.zips/*
Cleaning Up the Matterhorn Database
Matterhorn will query its own mh_job
table every minute or so, and since older jobs are not removed from this table it will eventually reach a point where these queries become a noticeable drain on system resources. To clean out the table:
- Locate your matterhorn install and check etc/config.properties for the database you are using (default is /usr/local/matterhorn/etc/config.properties)
- Open up that database via your browser of choice and select the `mh_job` table
- Delete rows where the operation value is NOT START_WORKFLOW (we want to retain those jobs for later reference). You can use
date_created
to scope this delete via time. - Your db table should be much smaller now.
delete from mh_job where operation <> 'START_WORKFLOW' and date_created < (NOW() - INTERVAL 1 MONTH);
If the previous query throws a foreign key constraint error, you'll need to use the query below to delete all the rows without any children first, freeing their parent rows, then delete the parents. Repeat this process a few times until all are deleted
delete p from mh_job as p left join mh_job as c ON p.id = c.parent where c.id is NULL and p.operation <> 'START_WORKFLOW' and p.date_created < (NOW() - INTERVAL 1 MONTH);
You may also opt to use a Scheduled Event to run this cleanup automatically
CREATE EVENT IF NOT EXISTS `clean_mh_job` ON SCHEDULE EVERY 1 MONTH STARTS CURRENT_TIMESTAMP ON COMPLETION PRESERVE ENABLE COMMENT '' DO delete p from mh_job as p left join mh_job as c ON p.id = c.parent where c.id is NULL and p.operation <> 'START_WORKFLOW' and p.date_created < (NOW() - INTERVAL 1 MONTH);