Category description and use cases
Workflow example:
Output standard
Summary: An array of segments, each with a start and end. Start and end are timestamps in seconds. The label may be one of "applause" or "non-applause."
Element | Datatype | Obligation | Definition |
media | object | required | Wrapper for metadata about the source media file. |
media.filename | string | required | Filename of the source file. |
media.duration | string | required | The duration of the source file audio. |
segments | array | required | Wrapper for segments of silence, speech, or audio. |
segments[*] | object | optional | A segment of silence, speech, or audio. |
segments[*].label | string | required | The type of segment: applause or non-applause |
segments[*].start | string | required | Start time in seconds. |
segments[*].end | string | required | End time in seconds. |
JSON Schema
Sample output
Recommended tool(s)
Acoustic Classification Segmentation (custom)
Official documentation: https://github.com/lizfischer/acoustic-classification-segmentation
Language: Python
Description: A tensorflow implementation of speech, music, noise, silence, and applause segmentation for audio files; forked from Brandeis Lab for Linguistics & Computation.
Cost: Free (open source)
Installation & requirements
Clone repository (link above) & use `pip install -r requirements.txt`
Requires ffmpeg, and Python 3 with the following libraries:
librosa==0.7.2
numpy==1.17.4
numpydoc==0.9.2
scipy==1.4.1
scikit-learn==0.22.1
ffmpeg-python==0.2.0
tensorflow>=2.0.1
Parameters
Input formats
mp3, wav, or mp4
Example Usage
Note: This tools runs over all mp3, mp4, or wav files in the input directory, it does not take a single file input.
python run.py -s pretrained/applause-binary-20210203 /path/to/media/folder -o /path/to/output/folder -T 1000 -b
Example Output
[ { "label": "non-applause", "start": 0.0, "end": 0.64 }, { "label": "applause", "start": 0.65, "end": 6.78 }, { "label": "non-applause", "start": 6.79, "end": 373.83 }, { "label": "applause", "start": 373.84, "end": 379.55 }, { "label": "non-applause", "start": 379.56, "end": 384.52 }, { "label": "applause", "start": 384.53, "end": 390.34 }, { "label": "non-applause", "start": 390.35, "end": 430.69 }, { "label": "applause", "start": 430.7, "end": 433.98 }, { "label": "non-applause", "start": 433.99, "end": 963.03 }, { "label": "applause", "start": 963.04, "end": 982.04 }, { "label": "non-applause", "start": 982.05, "end": 1388.61 }, { "label": "applause", "start": 1388.62, "end": 1398.6 }, { "label": "non-applause", "start": 1398.61, "end": 1799.13 }, { "label": "applause", "start": 1799.14, "end": 1807.36 }, { "label": "non-applause", "start": 1807.37, "end": 1857.13 }, { "label": "applause", "start": 1857.14, "end": 1864.86 }, { "label": "non-applause", "start": 1864.87, "end": 1901.45 } ]
Other evaluated tools
Yamnet
Official documentation: <link>
Language:
Description:
Cost: <$ OR Free (open source)>
Social impact:
Notes: