Bob 2.0 computation of spectrogram for audio samples. The silent tail/head are trimmed.

This algorithm is a legacy one. The API has changed since its implementation. New versions and forks will need to be updated.
This algorithm is splittable

Algorithms have at least one input and one output. All algorithm endpoints are organized in groups. Groups are used by the platform to indicate which inputs and outputs are synchronized together. The first group is automatically synchronized with the channel defined by the block in which the algorithm is deployed.

Group: main

Endpoint Name Data Format Nature
labels system/array_1d_integers/1 Input
speech system/array_1d_floats/1 Input
data system/array_2d_floats/1 Output

Parameters allow users to change the configuration of an algorithm when scheduling an experiment

Name Description Type Default Range/Choices
mel_scale Apply Mel-scale filtering or use linear (default - linear) bool False
pre_emphasis_coef Pre-emphasis coefficient, used in the spectrogram computation float64 1.0
win_shift_ms The length of the overlap between neighboring windows. Typically the half of window length. float64 10.0
win_length_ms The length of the sliding processing window, typically about 20 ms float64 20.0
rate Sampling rate of the speech signal float64 16000.0 [2000.0, 256000.0]
n_filters The number of filter bands used in spectrogram computation uint32 40
xxxxxxxxxx
83
 
1
###############################################################################
2
#                                                                             #
3
# Copyright (c) 2016 Idiap Research Institute, http://www.idiap.ch/           #
4
# Contact: beat.support@idiap.ch                                              #
5
#                                                                             #
6
# This file is part of the beat.core module of the BEAT platform.             #
7
#                                                                             #
8
# Commercial License Usage                                                    #
9
# Licensees holding valid commercial BEAT licenses may use this file in       #
10
# accordance with the terms contained in a written agreement between you      #
11
# and Idiap. For further information contact tto@idiap.ch                     #
12
#                                                                             #
13
# Alternatively, this file may be used under the terms of the GNU Affero      #
14
# Public License version 3 as published by the Free Software and appearing    #
15
# in the file LICENSE.AGPL included in the packaging of this file.            #
16
# The BEAT platform is distributed in the hope that it will be useful, but    #
17
# WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY  #
18
# or FITNESS FOR A PARTICULAR PURPOSE.                                        #
19
#                                                                             #
20
# You should have received a copy of the GNU Affero Public License along      #
21
# with the BEAT platform. If not, see http://www.gnu.org/licenses/.           #
22
#                                                                             #
23
###############################################################################
24
25
import numpy
26
import bob.ap
27
28
class Algorithm:
29
30
    def __init__(self):
31
        self.win_length_ms = 20
32
        self.win_shift_ms = 10
33
        self.rate = 16000
34
        self.pre_emphasis_coef = 1.0
35
        self.mel_scale = True
36
        self.n_filters = 40
37
38
    def setup(self, parameters):
39
        self.rate = float(parameters.get('rate', self.rate))
40
        self.win_length_ms = float(parameters.get('win_length_ms', self.win_length_ms))
41
        self.win_shift_ms = float(parameters.get('win_shift_ms', self.win_shift_ms))
42
43
        self.pre_emphasis_coef = float(parameters.get('pre_emphasis_coef', self.pre_emphasis_coef))
44
        self.mel_scale = parameters.get('mel_scale', self.mel_scale)
45
        self.n_filters = parameters.get('n_filters', self.n_filters)
46
        return True
47
48
    def process(self, inputs, outputs):
49
        data = inputs["speech"].data.value.astype('float64')
50
        vad_labels = inputs["labels"].data.value
51
52
        # first, trim out the silences from both ends
53
        # if VAD detection worked on this sample
54
        if vad_labels.size == 2 and not vad_labels.all():
55
            # we probably could not read the sample, so no labels were computed
56
            print('VAD labels for the sample is invalid!')
57
        else:
58
            # take only speech frames, since in VAD speech frames are 1 and silence are 0
59
            speech, = numpy.nonzero(vad_labels)
60
            if len(speech) < len(vad_labels) and len(speech)>0:  # trim only if necessary
61
                nzstart=speech[0]*int(self.rate/1000*self.win_shift_ms)  # index of the first non-silent frame
62
                # make sure we count the length of non-speech shift plus the length of the last frame
63
                nzend=(speech[-1])*int(self.rate/1000*self.win_shift_ms) + int(self.rate/1000*self.win_length_ms)
64
65
                data = data[nzstart:nzend]
66
67
        print("Length of trimmed sample is %s", str(data.shape))
68
69
        # then, compute the spectrogram
70
        c = bob.ap.Spectrogram(float(self.rate), float(self.win_length_ms), float(self.win_shift_ms), int(self.n_filters), 0.0, float(self.rate/2.0), float(self.pre_emphasis_coef), bool(self.mel_scale))
71
        # energy power spectrum
72
        c.energy_filter = True  # ^2 of FFT spectrum
73
        # we take no log
74
        c.log_filter = False
75
        c.energy_bands = True  # band filtering
76
77
        spectrogram = c(data)
78
79
        outputs["data"].write({
80
            'value':spectrogram
81
        })
82
        return True
83

The code for this algorithm in Python
The ruler at 80 columns indicate suggested POSIX line breaks (for readability).
The editor will automatically enlarge to accomodate the entirety of your input
Use keyboard shortcuts for search/replace and faster editing. For example, use Ctrl-F (PC) or Cmd-F (Mac) to search through this box

Returns trimmed-spectrogram of an audio sample. Silent start and end of a sample are trimmed using Voice Activity Detection (VAD) labels as input.

Experiments

Updated Name Databases/Protocols Analyzers
pkorshunov/pkorshunov/speech-antispoofing-baseline/1/btas2016-baseline-pa avspoof/1@physicalaccess_antispoofing pkorshunov/simple_antispoofing_analyzer/2
Created with Raphaël 2.1.2[compare]pkorshunov/trimmed_spectrogram/22016Mar9

This table shows the number of times this algorithm has been successfully run using the given environment. Note this does not provide sufficient information to evaluate if the algorithm will run when submitted to different conditions.

Terms of Service | Contact Information | BEAT platform version 2.2.1b0 | © Idiap Research Institute - 2013-2025