Scipt Name: autoupdate.pl (/opt/etext/common/scr)
Description: automatically retrieve, index Etext and/or update the corresponding web browse page(s) as well as the sitemap file.
Requirement: 1) an xml configuration file (see the sample configuration file in the attachment)
2) logsub.pl in /opt/etext/common/scr
- Configuration file (configuration folder: brie:/opt/etext/scr/config)
- Step(s)* This argument gives user the options of conducting individual step only.
- r:retrieve, i:index, u:update browse page.
- default: riu (all three steps)
Script Usage Example: perl autoupdate.pl config_general.xml iu
- collection directory/data/.xml*|*.xml.md -- data retrieved from xubmit
- collection directory/index/ -- index files
- /www/webapp1.dlib.indiana.edu/htdocs/sitemaps/$collection.xml -- sitemap file
- collection directory/log/$prog_$collection_date.log -- program log (optional)
- emails send to developers and others if any
- When read in the configuration xml, process will stop if any mandatory field is missing.
- If retrieving files from xubmit, process will stop and send fail notification email to developers if error occurs;process will also stop but send a success notification email if no new files are retrieved.
- If indexing files from data folder, process will stop and send fail notification email to developers if error occurs or no files are indexed.
- If updating browse page(s), process stops if error occurs.
- If no errors occur during each requested step, a success email message containing brief summary of the process will be sent to both developers and other related personnel.
- By defaul, the logging option is turned on (to turn it off, change the value of $log to 0). A log file will be generated each time the script is called to process one collection. Within the same day, log information is appended to the log file. Seperated log files will be kept for different date to avoid single exhumongous log file.
- If indexing after retrieval, indexer will run on 'incremental' mode, in indexing is conducted alone, indexer will run on 'clean' mode.
- By default, only files that are tagged with status "minimal" and "completed" will be extracted. If you want files with other statuses to be extracted, please follow the documentation of the Xubmit Command Line Admin Tool and manually extract those files, then run this autoupdate.pl script with "iu" options.
Right now, the update browse page command for imh collection is different from other TEI files. After it is rereengineered into a xslt based approach like the others, the if ($collection eq 'imh') statement can be removed.