Page tree
Skip to end of metadata
Go to start of metadata

Category description and use cases

Workflow example:


Output standard

Summary: An array of segments, each with a start and end. Start and end are timestamps in seconds. The label may be one of "applause" or "non-applause."

Element

Datatype

Obligation

Definition

media

object

required

Wrapper for metadata about the source media file.

media.filename

string

required

Filename of the source file.

media.duration

string

required

The duration of the source file audio.

segments

array

required

Wrapper for segments of silence, speech, or audio.

segments[*]

object

optional

A segment of silence, speech, or audio.

segments[*].labelstringrequiredThe type of segment: applause or non-applause

segments[*].start

string

required

Start time in seconds.

segments[*].end

string

required

End time in seconds.


JSON Schema

Schema
{
    "$schema": "http://json-schema.org/schema#",
    "type": "object",
    "title": "Applause Detection Schema",
    "required": [
        "media",
        "segments"
    ],
    "properties": {
        "media": {
            "type": "object",
            "title": "Media",
            "description": "Wrapper for metadata about the source media file.",
            "required": [
                "filename",
                "duration"
            ],
            "properties": {
                "filename": {
                    "type": "string",
                    "title": "Filename",
                    "description": "Filename of the source file.",
                    "default": "",
                    "examples": [
                        "myfile.wav"
                    ]
                },
                "duration": {
                    "type": "string",
                    "title": "Duration",
                    "description": "Duration of the source file audio.",
                    "default": "",
                    "examples": [
                        "25.888"
                    ]
                }
            }
        },
        "segments": {
            "type": "array",
            "title": "Segments",
            "description": "Segments of silence, speech, or audio.",
            "items": {
                "type": "object",
                "required": [
                  	"label",
                    "start",
                    "end"         
                ],
                "properties": {
                  	"label": {
                      "type": "string",
                      "title": "Label",
                      "description": "The type of sound",
                      "enum": [
                          "applause",
                          "non-applause"
                      ]
                	}
                }
            }
        }
    }
}

Sample output

Sample Output
{
  "media": {
	"filename": "name.wav",
	"duration": "300"
  },
  "segments":[
    {
        "label": "non-applause",
        "start": 0.0,
        "end": 198.37
    },
    {
        "label": "applause",
        "start": 198.38,
        "end": 206.04
    }
  ]
}


Recommended tool(s)

Acoustic Classification Segmentation (custom)

Official documentation: https://github.com/lizfischer/acoustic-classification-segmentation

Language: Python

Description: A tensorflow implementation of speech, music, noise, silence, and applause segmentation for audio files; forked from Brandeis Lab for Linguistics & Computation.

Cost: Free (open source)

Installation & requirements

Clone repository (link above) & use `pip install -r requirements.txt`

Requires ffmpeg, and Python 3 with the following libraries:

librosa==0.7.2
numpy==1.17.4
numpydoc==0.9.2
scipy==1.4.1
scikit-learn==0.22.1
ffmpeg-python==0.2.0
tensorflow>=2.0.1

Parameters

Input formats

mp3, wav, or mp4

Example Usage

Note: This tools runs over all mp3, mp4, or wav files in the input directory, it does not take a single file input.

<tool name> Example
python run.py -s pretrained/applause-binary-20210203 /path/to/media/folder -o /path/to/output/folder -T 1000 -b

Example Output

<tool name> Output
[
    {
        "label": "non-applause",
        "start": 0.0,
        "end": 0.64
    },
    {
        "label": "applause",
        "start": 0.65,
        "end": 6.78
    },
    {
        "label": "non-applause",
        "start": 6.79,
        "end": 373.83
    },
    {
        "label": "applause",
        "start": 373.84,
        "end": 379.55
    },
    {
        "label": "non-applause",
        "start": 379.56,
        "end": 384.52
    },
    {
        "label": "applause",
        "start": 384.53,
        "end": 390.34
    },
    {
        "label": "non-applause",
        "start": 390.35,
        "end": 430.69
    },
    {
        "label": "applause",
        "start": 430.7,
        "end": 433.98
    },
    {
        "label": "non-applause",
        "start": 433.99,
        "end": 963.03
    },
    {
        "label": "applause",
        "start": 963.04,
        "end": 982.04
    },
    {
        "label": "non-applause",
        "start": 982.05,
        "end": 1388.61
    },
    {
        "label": "applause",
        "start": 1388.62,
        "end": 1398.6
    },
    {
        "label": "non-applause",
        "start": 1398.61,
        "end": 1799.13
    },
    {
        "label": "applause",
        "start": 1799.14,
        "end": 1807.36
    },
    {
        "label": "non-applause",
        "start": 1807.37,
        "end": 1857.13
    },
    {
        "label": "applause",
        "start": 1857.14,
        "end": 1864.86
    },
    {
        "label": "non-applause",
        "start": 1864.87,
        "end": 1901.45
    }
]

Other evaluated tools

Yamnet

Official documentation: <link>

Language: 

Description: 

Cost: <$ OR Free (open source)>

Social impact: 

Notes: 

Installation & requirements


Parameters


Input formats


Example Usage

<tool name> Example
 

Example Output

<tool name> Output
 

Evaluation summary


  • No labels