  • About
    • The Vocabulary Tagging tool is a python adapter which takes in a transcript and searches through a text file looking for occurrences of those words or phrases.
  • Source Code
    • galaxy/tools/amp_stt/vocabulary_tagging.xml
      Tool configuration detailing tool execution, input file, output file, and labeling.
    • galaxy/tools/amp_stt/
      Python script to handle the input parameters and call the code to search for the list of words. 
  • Running the tool

    • The tool can be invoked from Galaxy UI as other tools. User needs to supply input data in the form of standardized speech to text output, as well as a text file containing a list of words to search for. 

  • Parameters
    • $amp_transcript: Transcript in AMP Speech to Text format
    • $words_to_tag: Text file containing a list of words to search for.  Each line contains a single phrase or word
  • Output
    • $tagged_words: Output CSV file containing the word text matched and the start time of each occurrence.  
  • Notes
    • This tool will remove periods, commas, and exclamation points from tagged words. 
    • Punctuation (periods, commas, exclamation points) act as boundaries in the transcript text.  For instance, "potato sack" will not match "potato. Sack"
    • The list of words to tag is deduplicated
    • Matching is case insensitive and words are trimmed for excess whitespace before, after and between words.
