Table of Contents | ||||
---|---|---|---|---|
|
Category description and use cases
Workflow example:
Speech-to-Text > Transcript Editor > Forced Aligner
Output standard
Summary:
JSON Schema
Code Block | ||||||
---|---|---|---|---|---|---|
| ||||||
{ } |
Sample output
Code Block | ||||||
---|---|---|---|---|---|---|
| ||||||
{ } |
Recommended tool(s)
Gentle
Official documentation: Gentle on Github
Language: REST API or Python on command line
Description:
Cost: Free (open source)
Social impact:
Notes:
Installation & requirements
Two options for installation:
- Install Docker image to run webserver, then use API
- Download source code and run bash installation script, then use as a command line python program
Parameters
Input formats
Audio (mp3, wav, possibly other formats) and transcript (plain text).
Example Usage
Code Block | ||||
---|---|---|---|---|
| ||||
curl -F "audio=@audio.mp3" -F "transcript=@words.txt" "http://localhost:8765/transcriptions?async=false"# ORpython3 align.py audio.mp3 words.txt |
Example Output
Code Block | ||||
---|---|---|---|---|
| ||||
{ "transcript": "Now, let me looking at the Congress, uh, as one of the institutions in trouble, uh, to some degree, not the same degree as others, perhaps, but still part of the whole mail.", "words": [ { "alignedWord": "now", "case": "success", "end": 38.29, "endOffset": 3, "phones": [ { "duration": 0.12, "phone": "n_B" }, { "duration": 0.01, "phone": "aw_E" } ], "start": 38.16, "startOffset": 0, "word": "Now" }, { "alignedWord": "let", "case": "success", "end": 38.65, "endOffset": 8, "phones": [ { "duration": 0.05, "phone": "l_B" }, { "duration": 0.07, "phone": "eh_I" }, { "duration": 0.07, "phone": "t_E" } ], "start": 38.46, "startOffset": 5, "word": "let" }, { "alignedWord": "me", "case": "success", "end": 38.9, "endOffset": 11, "phones": [ { "duration": 0.08, "phone": "m_B" }, { "duration": 0.17, "phone": "iy_E" } ], "start": 38.65, "startOffset": 9, "word": "me" }, { "alignedWord": "looking", "case": "success", "end": 39.24, "endOffset": 19, "phones": [ { "duration": 0.08, "phone": "l_B" }, { "duration": 0.05, "phone": "uh_I" }, { "duration": 0.06, "phone": "k_I" }, { "duration": 0.07, "phone": "ih_I" }, { "duration": 0.06, "phone": "ng_E" } ], "start": 38.92, "startOffset": 12, "word": "looking" }, { "alignedWord": "at", "case": "success", "end": 39.370000000000005, "endOffset": 22, "phones": [ { "duration": 0.06, "phone": "ae_B" }, { "duration": 0.07, "phone": "t_E" } ], "start": 39.24, "startOffset": 20, "word": "at" }, { "alignedWord": "the", "case": "success", "end": 39.589999999999996, "endOffset": 26, "phones": [ { "duration": 0.07, "phone": "dh_B" }, { "duration": 0.15, "phone": "ah_E" } ], "start": 39.37, "startOffset": 23, "word": "the" }, { "alignedWord": "congress", "case": "success", "end": 40.2, "endOffset": 35, "phones": [ { "duration": 0.11, "phone": "k_B" }, { "duration": 0.11, "phone": "aa_I" }, { "duration": 0.05, "phone": "ng_I" }, { "duration": 0.04, "phone": "g_I" }, { "duration": 0.05, "phone": "r_I" }, { "duration": 0.08, "phone": "ah_I" }, { "duration": 0.16, "phone": "s_E" } ], "start": 39.6, "startOffset": 27, "word": "Congress" }, { "alignedWord": "uh", "case": "success", "end": 40.43, "endOffset": 39, "phones": [ { "duration": 0.23, "phone": "ah_S" } ], "start": 40.2, "startOffset": 37, "word": "uh" }, { "alignedWord": "as", "case": "success", "end": 41.129999999999995, "endOffset": 43, "phones": [ { "duration": 0.18, "phone": "ae_B" }, { "duration": 0.08, "phone": "z_E" } ], "start": 40.87, "startOffset": 41, "word": "as" }, { "alignedWord": "one", "case": "success", "end": 41.34, "endOffset": 47, "phones": [ { "duration": 0.09, "phone": "w_B" }, { "duration": 0.08, "phone": "ah_I" }, { "duration": 0.04, "phone": "n_E" } ], "start": 41.13, "startOffset": 44, "word": "one" }, { "alignedWord": "of", "case": "success", "end": 41.440000000000005, "endOffset": 50, "phones": [ { "duration": 0.05, "phone": "ah_B" }, { "duration": 0.05, "phone": "v_E" } ], "start": 41.34, "startOffset": 48, "word": "of" }, { "alignedWord": "the", "case": "success", "end": 41.57, "endOffset": 54, "phones": [ { "duration": 0.07, "phone": "dh_B" }, { "duration": 0.05, "phone": "iy_E" } ], "start": 41.45, "startOffset": 51, "word": "the" }, { "alignedWord": "institutions", "case": "success", "end": 42.3, "endOffset": 67, "phones": [ { "duration": 0.05, "phone": "ih_B" }, { "duration": 0.05, "phone": "n_I" }, { "duration": 0.08, "phone": "s_I" }, { "duration": 0.05, "phone": "t_I" }, { "duration": 0.07, "phone": "ih_I" }, { "duration": 0.08, "phone": "t_I" }, { "duration": 0.09, "phone": "uw_I" }, { "duration": 0.06, "phone": "sh_I" }, { "duration": 0.07, "phone": "ah_I" }, { "duration": 0.07, "phone": "n_I" }, { "duration": 0.06, "phone": "z_E" } ], "start": 41.57, "startOffset": 55, "word": "institutions" }, { "alignedWord": "in", "case": "success", "end": 42.44, "endOffset": 70, "phones": [ { "duration": 0.08, "phone": "ih_B" }, { "duration": 0.06, "phone": "n_E" } ], "start": 42.3, "startOffset": 68, "word": "in" }, { "alignedWord": "trouble", "case": "success", "end": 42.96, "endOffset": 78, "phones": [ { "duration": 0.09, "phone": "t_B" }, { "duration": 0.05, "phone": "r_I" }, { "duration": 0.05, "phone": "ah_I" }, { "duration": 0.05, "phone": "b_I" }, { "duration": 0.06, "phone": "ah_I" }, { "duration": 0.22, "phone": "l_E" } ], "start": 42.44, "startOffset": 71, "word": "trouble" }, { "alignedWord": "uh", "case": "success", "end": 43.300000000000004, "endOffset": 82, "phones": [ { "duration": 0.32, "phone": "ah_S" } ], "start": 42.980000000000004, "startOffset": 80, "word": "uh" }, { "alignedWord": "to", "case": "success", "end": 43.46, "endOffset": 86, "phones": [ { "duration": 0.06, "phone": "t_B" }, { "duration": 0.06, "phone": "ih_E" } ], "start": 43.34, "startOffset": 84, "word": "to" }, { "alignedWord": "some", "case": "success", "end": 43.660000000000004, "endOffset": 91, "phones": [ { "duration": 0.06, "phone": "s_B" }, { "duration": 0.06, "phone": "ah_I" }, { "duration": 0.08, "phone": "m_E" } ], "start": 43.46, "startOffset": 87, "word": "some" }, { "alignedWord": "degree", "case": "success", "end": 44.12, "endOffset": 98, "phones": [ { "duration": 0.03, "phone": "d_B" }, { "duration": 0.06, "phone": "ih_I" }, { "duration": 0.06, "phone": "g_I" }, { "duration": 0.1, "phone": "r_I" }, { "duration": 0.21, "phone": "iy_E" } ], "start": 43.66, "startOffset": 92, "word": "degree" }, { "alignedWord": "not", "case": "success", "end": 44.369999, "endOffset": 103, "phones": [ { "duration": 0.1, "phone": "n_B" }, { "duration": 0.06, "phone": "aa_I" }, { "duration": 0.07, "phone": "t_E" } ], "start": 44.139999, "startOffset": 100, "word": "not" }, { "alignedWord": "the", "case": "success", "end": 44.47, "endOffset": 107, "phones": [ { "duration": 0.06, "phone": "dh_B" }, { "duration": 0.04, "phone": "ah_E" } ], "start": 44.37, "startOffset": 104, "word": "the" }, { "alignedWord": "same", "case": "success", "end": 44.699999999999996, "endOffset": 112, "phones": [ { "duration": 0.1, "phone": "s_B" }, { "duration": 0.07, "phone": "ey_I" }, { "duration": 0.06, "phone": "m_E" } ], "start": 44.47, "startOffset": 108, "word": "same" }, { "alignedWord": "degree", "case": "success", "end": 44.96, "endOffset": 119, "phones": [ { "duration": 0.03, "phone": "d_B" }, { "duration": 0.06, "phone": "ih_I" }, { "duration": 0.06, "phone": "g_I" }, { "duration": 0.07, "phone": "r_I" }, { "duration": 0.04, "phone": "iy_E" } ], "start": 44.7, "startOffset": 113, "word": "degree" }, { "alignedWord": "as", "case": "success", "end": 45.08, "endOffset": 122, "phones": [ { "duration": 0.08, "phone": "eh_B" }, { "duration": 0.04, "phone": "z_E" } ], "start": 44.96, "startOffset": 120, "word": "as" }, { "alignedWord": "others", "case": "success", "end": 45.37, "endOffset": 129, "phones": [ { "duration": 0.1, "phone": "ah_B" }, { "duration": 0.07, "phone": "dh_I" }, { "duration": 0.06, "phone": "er_I" }, { "duration": 0.06, "phone": "z_E" } ], "start": 45.08, "startOffset": 123, "word": "others" }, { "alignedWord": "perhaps", "case": "success", "end": 45.8, "endOffset": 138, "phones": [ { "duration": 0.05, "phone": "p_B" }, { "duration": 0.07, "phone": "er_I" }, { "duration": 0.04, "phone": "hh_I" }, { "duration": 0.11, "phone": "ae_I" }, { "duration": 0.09, "phone": "p_I" }, { "duration": 0.07, "phone": "s_E" } ], "start": 45.37, "startOffset": 131, "word": "perhaps" }, { "alignedWord": "but", "case": "success", "end": 45.959999999999994, "endOffset": 143, "phones": [ { "duration": 0.06, "phone": "b_B" }, { "duration": 0.03, "phone": "ah_I" }, { "duration": 0.07, "phone": "t_E" } ], "start": 45.8, "startOffset": 140, "word": "but" }, { "alignedWord": "still", "case": "success", "end": 46.230000000000004, "endOffset": 149, "phones": [ { "duration": 0.08, "phone": "s_B" }, { "duration": 0.04, "phone": "t_I" }, { "duration": 0.08, "phone": "ih_I" }, { "duration": 0.07, "phone": "l_E" } ], "start": 45.96, "startOffset": 144, "word": "still" }, { "alignedWord": "part", "case": "success", "end": 46.440000000000005, "endOffset": 154, "phones": [ { "duration": 0.08, "phone": "p_B" }, { "duration": 0.05, "phone": "aa_I" }, { "duration": 0.07, "phone": "r_I" }, { "duration": 0.01, "phone": "t_E" } ], "start": 46.230000000000004, "startOffset": 150, "word": "part" }, { "alignedWord": "of", "case": "success", "end": 46.589999999999996, "endOffset": 157, "phones": [ { "duration": 0.07, "phone": "ah_B" }, { "duration": 0.08, "phone": "v_E" } ], "start": 46.44, "startOffset": 155, "word": "of" }, { "alignedWord": "the", "case": "success", "end": 46.760000000000005, "endOffset": 161, "phones": [ { "duration": 0.01, "phone": "dh_B" }, { "duration": 0.16, "phone": "ah_E" } ], "start": 46.59, "startOffset": 158, "word": "the" }, { "alignedWord": "whole", "case": "success", "end": 47.419999000000004, "endOffset": 167, "phones": [ { "duration": 0.08, "phone": "hh_B" }, { "duration": 0.08, "phone": "ow_I" }, { "duration": 0.12, "phone": "l_E" } ], "start": 47.139999, "startOffset": 162, "word": "whole" }, { "alignedWord": "mail ", "case": "success", "end": 47.7, "endOffset": 172, "phones": [ { "duration": 0.06, "phone": "m_B" }, { "duration": 0.11, "phone": "ey_I" }, { "duration": 0.11, "phone": "l_E" } ], "start": 47.42, "startOffset": 168, "word": "mail" } ] } |
Other evaluated tools
Tool Name
Official documentation: <link>
Language:
Description:
Cost: <$ OR Free (open source)>
Social impact:
Notes:
Installation & requirements
Parameters
Input formats
Example Usage
Code Block | ||||
---|---|---|---|---|
| ||||
Example Output
Code Block | ||
---|---|---|
| ||