- About
- The Vocabulary Tagging tool is a python adapter which takes in a transcript and searches through a text file looking for occurrences of those words or phrases.
- Source Code
- galaxy/tools/amp_stt/vocabulary_tagging.xml
Tool configuration detailing tool execution, input file, output file, and labeling. - galaxy/tools/amp_stt/vocabulary_tagging.py
Python script to handle the input parameters and call the code to search for the list of words.
- galaxy/tools/amp_stt/vocabulary_tagging.xml
Running the tool
The tool can be invoked from Galaxy UI as other tools. User needs to supply input data in the form of standardized speech to text output, as well as a text file containing a list of words to search for.
- Parameters
- $amp_transcript: Transcript in AMP Speech to Text format
- $words_to_tag: Text file containing a list of words to search for. Each line contains a single phrase or word
- Output
- $tagged_words: Output CSV file containing the word text matched and the start time of each occurrence.
- Notes
- This tool will remove periods, commas, and exclamation points from tagged words.
- Punctuation (periods, commas, exclamation points) act as boundaries in the transcript text. For instance, "potato sack" will not match "potato. Sack"
- The list of words to tag is deduplicated
- Matching is case insensitive and words are trimmed for excess whitespace before, after and between words.