Skip to end of metadata
Go to start of metadata
- Source code
- Location: galaxy/tools/aws/transcribe.xml, which is calls transcribe shell script.
- The transcribe script uses AWS CLI API to accomplish everything.
- To invoke the script from command line, use:
- transcribe $job_directory $input_file $output_file $audio_format $s3_bucket $s3_directory
- The tool can be invoked from Galaxy UI as other tools.
- User needs to use Get Data / Upload from computer tool to ingest the input file into Galaxy before running the tool.
- When ingesting, choose binary (the default) as file format. The file then will be copied into a designated location in Galaxy file system.
- The script sends the input file to the specified AWS S3 bucket.
- The tool parameters have detailed label and help info to help users understand what each one is for and its valid value set.
- $input_file: the audio file to transcribe.
- $output_file: the result json file with transcript text generated by AWS service
- $audio_format: Format of the audio file. For best results use a lossless format such as FLAC or WAV with PCM 16-bit encoding.
- $s3_bucket: S3 bucket used for temporary storage of transcribe input/output files. For dev testing, we can use amppd-dev-test
- $s3_directory: S3 directory inside S3 bucket for input files; if left blank, the input files will be directly under the s3_bucket. For ex, under amppd-dev-test there is a directory AMP-209 created for testing AMP-209.
- AWS Job Name
- The transcribe script uses a seqNo to identify multiple runs of AWS transcribe job, as AWS Transcribe service requires unique job name for each job submission.
- The job name is thus in the following pattern: AwsTranscribe-n.
- Job related files: the job name is also used for all generated files related to each job run, namely:
- AwsTranscribe-n-request.json
- AwsTranscribe-n-response.json
- AwsTranscribe-n.log
- The output file
- The output file created by AWS service following the name pattern of $jobName.json, AwsTranscribe-1.json, and is put into the same S3 bucket as the input file.
- When the AWS job finishes, the script downloads the above json file to the specified $output_file in designated Galaxy file folder, and is viewable through Galaxy UI
- Note:
- $job_directory is a parameter passed to the transcribe.sh script, but it is not a Galaxy tool parameter, because it is a system directory and shall not be user-defined. This is where transcribe temporary job related files are stored. It should generally be a folder outside galaxy root folder, but on the local file system where the script is run. Currently the value of $job_directory is hardcoded in transcribe.xml as $__root_dir__/../galaxy_logs/aws/transcribe, which will resolve to /srv/amp/galaxy_logs/aws/transcribe on potassium test server. This directory needs to preexist when the tool is run.
- In the future, it might be a better idea to define $job_directory as an environmental variable or in a configuration file shared by other tools.
{"serverDuration": 58, "requestCorrelationId": "b79a13a2785f994e"}