Page tree
Skip to end of metadata
Go to start of metadata
  • Source code
    • Location: galaxy/tools/aws/transcribe.xml, which is calls transcribe shell script.
    • The transcribe script uses AWS CLI API to accomplish everything.
  • To invoke the script from command line, use: 
    • transcribe $job_directory $input_file $output_file $audio_format $s3_bucket $s3_directory
  • The tool can be invoked from Galaxy UI as other tools.
    • User needs to use Get Data / Upload from computer tool to ingest the input file into Galaxy before running the tool.
    • When ingesting, choose binary (the default) as file format. The file then will be copied into a designated location in Galaxy file system.
    • The script sends the input file to the specified AWS S3 bucket.
  • The tool parameters have detailed label and help info to help users understand what each one is for and its valid value set.
    • $input_file: the audio file to transcribe. 
    • $output_file: the result json file with transcript text generated by AWS service
    • $audio_format: Format of the audio file. For best results use a lossless format such as FLAC or WAV with PCM 16-bit encoding.
    • $s3_bucket: S3 bucket used for temporary storage of transcribe input/output files. For dev testing, we can use amppd-dev-test
    • $s3_directory: S3 directory inside S3 bucket for input files; if left blank, the input files will be directly under the s3_bucket. For ex, under amppd-dev-test there is a directory AMP-209 created for testing AMP-209.
  • AWS Job Name
    • The transcribe script uses a seqNo to identify multiple runs of AWS transcribe job, as AWS Transcribe service requires unique job name for each job submission.
    • The job name is thus in the following pattern: AwsTranscribe-n.
  • Job related files: the job name is also used for all generated files related to each job run, namely:
    • AwsTranscribe-n-request.json
    • AwsTranscribe-n-response.json
    • AwsTranscribe-n.log
  • The output file
    • The output file created by AWS service following the name pattern of $jobName.json, AwsTranscribe-1.json, and is put into the same S3 bucket as the input file.
    • When the AWS job finishes, the script downloads the above json file to the specified $output_file in designated Galaxy file folder, and is viewable through Galaxy UI
  • Note:
    • $job_directory is a parameter passed to the transcribe.sh script, but it is not a Galaxy tool parameter, because it is a system directory and shall not be user-defined. This is where transcribe temporary job related files are stored. It should generally be a folder outside galaxy root folder, but on the local file system where the script is run. Currently the value of $job_directory is hardcoded in transcribe.xml as $__root_dir__/../galaxy_logs/aws/transcribe, which will resolve to /srv/amp/galaxy_logs/aws/transcribe on potassium test server. This directory needs to preexist when the tool is run. 
    • In the future, it might be a better idea to define $job_directory as an environmental variable or in a configuration file shared by other tools.
  • No labels