Category description and use cases

Workflow example:

Speech-to-Text > Transcript Editor > Forced Aligner

Output standard

Summary: 

JSON Schema

{
 
}

Sample output

{

}


Recommended tool(s)

Gentle

Official documentation: Gentle on Github

Language: REST API or Python on command line

Description: 

Cost:  Free (open source)

Social impact: 

Notes: 

Installation & requirements

Two options for installation:

  1. Install Docker image to run webserver, then use API
  2. Download source code and run bash installation script, then use as a command line python program

Parameters

Input formats

Audio (mp3, wav, possibly other formats) and transcript (plain text).

Example Usage

curl -F "audio=@audio.mp3" -F "transcript=@words.txt" "http://localhost:8765/transcriptions?async=false"# ORpython3 align.py audio.mp3 words.txt

Example Output

{
"transcript": "Now, let me looking at the Congress, uh, as one of the institutions in trouble, uh, to some degree, not the same degree as others, perhaps, but still part of the whole mail.",
"words": [
{
"alignedWord": "now",
"case": "success",
"end": 38.29,
"endOffset": 3,
"phones": [
{
"duration": 0.12,
"phone": "n_B"
},
{
"duration": 0.01,
"phone": "aw_E"
}
],
"start": 38.16,
"startOffset": 0,
"word": "Now"
},
{
"alignedWord": "let",
"case": "success",
"end": 38.65,
"endOffset": 8,
"phones": [
{
"duration": 0.05,
"phone": "l_B"
},
{
"duration": 0.07,
"phone": "eh_I"
},
{
"duration": 0.07,
"phone": "t_E"
}
],
"start": 38.46,
"startOffset": 5,
"word": "let"
},
{
"alignedWord": "me",
"case": "success",
"end": 38.9,
"endOffset": 11,
"phones": [
{
"duration": 0.08,
"phone": "m_B"
},
{
"duration": 0.17,
"phone": "iy_E"
}
],
"start": 38.65,
"startOffset": 9,
"word": "me"
},
{
"alignedWord": "looking",
"case": "success",
"end": 39.24,
"endOffset": 19,
"phones": [
{
"duration": 0.08,
"phone": "l_B"
},
{
"duration": 0.05,
"phone": "uh_I"
},
{
"duration": 0.06,
"phone": "k_I"
},
{
"duration": 0.07,
"phone": "ih_I"
},
{
"duration": 0.06,
"phone": "ng_E"
}
],
"start": 38.92,
"startOffset": 12,
"word": "looking"
},
{
"alignedWord": "at",
"case": "success",
"end": 39.370000000000005,
"endOffset": 22,
"phones": [
{
"duration": 0.06,
"phone": "ae_B"
},
{
"duration": 0.07,
"phone": "t_E"
}
],
"start": 39.24,
"startOffset": 20,
"word": "at"
},
{
"alignedWord": "the",
"case": "success",
"end": 39.589999999999996,
"endOffset": 26,
"phones": [
{
"duration": 0.07,
"phone": "dh_B"
},
{
"duration": 0.15,
"phone": "ah_E"
}
],
"start": 39.37,
"startOffset": 23,
"word": "the"
},
{
"alignedWord": "congress",
"case": "success",
"end": 40.2,
"endOffset": 35,
"phones": [
{
"duration": 0.11,
"phone": "k_B"
},
{
"duration": 0.11,
"phone": "aa_I"
},
{
"duration": 0.05,
"phone": "ng_I"
},
{
"duration": 0.04,
"phone": "g_I"
},
{
"duration": 0.05,
"phone": "r_I"
},
{
"duration": 0.08,
"phone": "ah_I"
},
{
"duration": 0.16,
"phone": "s_E"
}
],
"start": 39.6,
"startOffset": 27,
"word": "Congress"
},
{
"alignedWord": "uh",
"case": "success",
"end": 40.43,
"endOffset": 39,
"phones": [
{
"duration": 0.23,
"phone": "ah_S"
}
],
"start": 40.2,
"startOffset": 37,
"word": "uh"
},
{
"alignedWord": "as",
"case": "success",
"end": 41.129999999999995,
"endOffset": 43,
"phones": [
{
"duration": 0.18,
"phone": "ae_B"
},
{
"duration": 0.08,
"phone": "z_E"
}
],
"start": 40.87,
"startOffset": 41,
"word": "as"
},
{
"alignedWord": "one",
"case": "success",
"end": 41.34,
"endOffset": 47,
"phones": [
{
"duration": 0.09,
"phone": "w_B"
},
{
"duration": 0.08,
"phone": "ah_I"
},
{
"duration": 0.04,
"phone": "n_E"
}
],
"start": 41.13,
"startOffset": 44,
"word": "one"
},
{
"alignedWord": "of",
"case": "success",
"end": 41.440000000000005,
"endOffset": 50,
"phones": [
{
"duration": 0.05,
"phone": "ah_B"
},
{
"duration": 0.05,
"phone": "v_E"
}
],
"start": 41.34,
"startOffset": 48,
"word": "of"
},
{
"alignedWord": "the",
"case": "success",
"end": 41.57,
"endOffset": 54,
"phones": [
{
"duration": 0.07,
"phone": "dh_B"
},
{
"duration": 0.05,
"phone": "iy_E"
}
],
"start": 41.45,
"startOffset": 51,
"word": "the"
},
{
"alignedWord": "institutions",
"case": "success",
"end": 42.3,
"endOffset": 67,
"phones": [
{
"duration": 0.05,
"phone": "ih_B"
},
{
"duration": 0.05,
"phone": "n_I"
},
{
"duration": 0.08,
"phone": "s_I"
},
{
"duration": 0.05,
"phone": "t_I"
},
{
"duration": 0.07,
"phone": "ih_I"
},
{
"duration": 0.08,
"phone": "t_I"
},
{
"duration": 0.09,
"phone": "uw_I"
},
{
"duration": 0.06,
"phone": "sh_I"
},
{
"duration": 0.07,
"phone": "ah_I"
},
{
"duration": 0.07,
"phone": "n_I"
},
{
"duration": 0.06,
"phone": "z_E"
}
],
"start": 41.57,
"startOffset": 55,
"word": "institutions"
},
{
"alignedWord": "in",
"case": "success",
"end": 42.44,
"endOffset": 70,
"phones": [
{
"duration": 0.08,
"phone": "ih_B"
},
{
"duration": 0.06,
"phone": "n_E"
}
],
"start": 42.3,
"startOffset": 68,
"word": "in"
},
{
"alignedWord": "trouble",
"case": "success",
"end": 42.96,
"endOffset": 78,
"phones": [
{
"duration": 0.09,
"phone": "t_B"
},
{
"duration": 0.05,
"phone": "r_I"
},
{
"duration": 0.05,
"phone": "ah_I"
},
{
"duration": 0.05,
"phone": "b_I"
},
{
"duration": 0.06,
"phone": "ah_I"
},
{
"duration": 0.22,
"phone": "l_E"
}
],
"start": 42.44,
"startOffset": 71,
"word": "trouble"
},
{
"alignedWord": "uh",
"case": "success",
"end": 43.300000000000004,
"endOffset": 82,
"phones": [
{
"duration": 0.32,
"phone": "ah_S"
}
],
"start": 42.980000000000004,
"startOffset": 80,
"word": "uh"
},
{
"alignedWord": "to",
"case": "success",
"end": 43.46,
"endOffset": 86,
"phones": [
{
"duration": 0.06,
"phone": "t_B"
},
{
"duration": 0.06,
"phone": "ih_E"
}
],
"start": 43.34,
"startOffset": 84,
"word": "to"
},
{
"alignedWord": "some",
"case": "success",
"end": 43.660000000000004,
"endOffset": 91,
"phones": [
{
"duration": 0.06,
"phone": "s_B"
},
{
"duration": 0.06,
"phone": "ah_I"
},
{
"duration": 0.08,
"phone": "m_E"
}
],
"start": 43.46,
"startOffset": 87,
"word": "some"
},
{
"alignedWord": "degree",
"case": "success",
"end": 44.12,
"endOffset": 98,
"phones": [
{
"duration": 0.03,
"phone": "d_B"
},
{
"duration": 0.06,
"phone": "ih_I"
},
{
"duration": 0.06,
"phone": "g_I"
},
{
"duration": 0.1,
"phone": "r_I"
},
{
"duration": 0.21,
"phone": "iy_E"
}
],
"start": 43.66,
"startOffset": 92,
"word": "degree"
},
{
"alignedWord": "not",
"case": "success",
"end": 44.369999,
"endOffset": 103,
"phones": [
{
"duration": 0.1,
"phone": "n_B"
},
{
"duration": 0.06,
"phone": "aa_I"
},
{
"duration": 0.07,
"phone": "t_E"
}
],
"start": 44.139999,
"startOffset": 100,
"word": "not"
},
{
"alignedWord": "the",
"case": "success",
"end": 44.47,
"endOffset": 107,
"phones": [
{
"duration": 0.06,
"phone": "dh_B"
},
{
"duration": 0.04,
"phone": "ah_E"
}
],
"start": 44.37,
"startOffset": 104,
"word": "the"
},
{
"alignedWord": "same",
"case": "success",
"end": 44.699999999999996,
"endOffset": 112,
"phones": [
{
"duration": 0.1,
"phone": "s_B"
},
{
"duration": 0.07,
"phone": "ey_I"
},
{
"duration": 0.06,
"phone": "m_E"
}
],
"start": 44.47,
"startOffset": 108,
"word": "same"
},
{
"alignedWord": "degree",
"case": "success",
"end": 44.96,
"endOffset": 119,
"phones": [
{
"duration": 0.03,
"phone": "d_B"
},
{
"duration": 0.06,
"phone": "ih_I"
},
{
"duration": 0.06,
"phone": "g_I"
},
{
"duration": 0.07,
"phone": "r_I"
},
{
"duration": 0.04,
"phone": "iy_E"
}
],
"start": 44.7,
"startOffset": 113,
"word": "degree"
},
{
"alignedWord": "as",
"case": "success",
"end": 45.08,
"endOffset": 122,
"phones": [
{
"duration": 0.08,
"phone": "eh_B"
},
{
"duration": 0.04,
"phone": "z_E"
}
],
"start": 44.96,
"startOffset": 120,
"word": "as"
},
{
"alignedWord": "others",
"case": "success",
"end": 45.37,
"endOffset": 129,
"phones": [
{
"duration": 0.1,
"phone": "ah_B"
},
{
"duration": 0.07,
"phone": "dh_I"
},
{
"duration": 0.06,
"phone": "er_I"
},
{
"duration": 0.06,
"phone": "z_E"
}
],
"start": 45.08,
"startOffset": 123,
"word": "others"
},
{
"alignedWord": "perhaps",
"case": "success",
"end": 45.8,
"endOffset": 138,
"phones": [
{
"duration": 0.05,
"phone": "p_B"
},
{
"duration": 0.07,
"phone": "er_I"
},
{
"duration": 0.04,
"phone": "hh_I"
},
{
"duration": 0.11,
"phone": "ae_I"
},
{
"duration": 0.09,
"phone": "p_I"
},
{
"duration": 0.07,
"phone": "s_E"
}
],
"start": 45.37,
"startOffset": 131,
"word": "perhaps"
},
{
"alignedWord": "but",
"case": "success",
"end": 45.959999999999994,
"endOffset": 143,
"phones": [
{
"duration": 0.06,
"phone": "b_B"
},
{
"duration": 0.03,
"phone": "ah_I"
},
{
"duration": 0.07,
"phone": "t_E"
}
],
"start": 45.8,
"startOffset": 140,
"word": "but"
},
{
"alignedWord": "still",
"case": "success",
"end": 46.230000000000004,
"endOffset": 149,
"phones": [
{
"duration": 0.08,
"phone": "s_B"
},
{
"duration": 0.04,
"phone": "t_I"
},
{
"duration": 0.08,
"phone": "ih_I"
},
{
"duration": 0.07,
"phone": "l_E"
}
],
"start": 45.96,
"startOffset": 144,
"word": "still"
},
{
"alignedWord": "part",
"case": "success",
"end": 46.440000000000005,
"endOffset": 154,
"phones": [
{
"duration": 0.08,
"phone": "p_B"
},
{
"duration": 0.05,
"phone": "aa_I"
},
{
"duration": 0.07,
"phone": "r_I"
},
{
"duration": 0.01,
"phone": "t_E"
}
],
"start": 46.230000000000004,
"startOffset": 150,
"word": "part"
},
{
"alignedWord": "of",
"case": "success",
"end": 46.589999999999996,
"endOffset": 157,
"phones": [
{
"duration": 0.07,
"phone": "ah_B"
},
{
"duration": 0.08,
"phone": "v_E"
}
],
"start": 46.44,
"startOffset": 155,
"word": "of"
},
{
"alignedWord": "the",
"case": "success",
"end": 46.760000000000005,
"endOffset": 161,
"phones": [
{
"duration": 0.01,
"phone": "dh_B"
},
{
"duration": 0.16,
"phone": "ah_E"
}
],
"start": 46.59,
"startOffset": 158,
"word": "the"
},
{
"alignedWord": "whole",
"case": "success",
"end": 47.419999000000004,
"endOffset": 167,
"phones": [
{
"duration": 0.08,
"phone": "hh_B"
},
{
"duration": 0.08,
"phone": "ow_I"
},
{
"duration": 0.12,
"phone": "l_E"
}
],
"start": 47.139999,
"startOffset": 162,
"word": "whole"
},
{
"alignedWord": "mail	",
"case": "success",
"end": 47.7,
"endOffset": 172,
"phones": [
{
"duration": 0.06,
"phone": "m_B"
},
{
"duration": 0.11,
"phone": "ey_I"
},
{
"duration": 0.11,
"phone": "l_E"
}
],
"start": 47.42,
"startOffset": 168,
"word": "mail"
}
]
}


Other evaluated tools

Tool Name

Official documentation: <link>

Language: 

Description: 

Cost: <$ OR Free (open source)>

Social impact: 

Notes: 

Installation & requirements


Parameters


Input formats


Example Usage

Example Output


Evaluation summary