Bob 2.0 computation of spectrogram for audio samples. The silent tail/head are trimmed.
Algorithms have at least one input and one output. All algorithm endpoints are organized in groups. Groups are used by the platform to indicate which inputs and outputs are synchronized together. The first group is automatically synchronized with the channel defined by the block in which the algorithm is deployed.
Endpoint Name | Data Format | Nature |
---|---|---|
labels | system/array_1d_integers/1 | Input |
speech | system/array_1d_floats/1 | Input |
data | system/array_2d_floats/1 | Output |
Parameters allow users to change the configuration of an algorithm when scheduling an experiment
Name | Description | Type | Default | Range/Choices |
---|---|---|---|---|
mel_scale | Apply Mel-scale filtering or use linear (default - linear) | bool | False | |
pre_emphasis_coef | Pre-emphasis coefficient, used in the spectrogram computation | float64 | 1.0 | |
win_shift_ms | The length of the overlap between neighboring windows. Typically the half of window length. | float64 | 10.0 | |
win_length_ms | The length of the sliding processing window, typically about 20 ms | float64 | 20.0 | |
rate | Sampling rate of the speech signal | float64 | 16000.0 | [2000.0, 256000.0] |
n_filters | The number of filter bands used in spectrogram computation | uint32 | 40 |
xxxxxxxxxx
###############################################################################
# #
# Copyright (c) 2016 Idiap Research Institute, http://www.idiap.ch/ #
# Contact: beat.support@idiap.ch #
# #
# This file is part of the beat.core module of the BEAT platform. #
# #
# Commercial License Usage #
# Licensees holding valid commercial BEAT licenses may use this file in #
# accordance with the terms contained in a written agreement between you #
# and Idiap. For further information contact tto@idiap.ch #
# #
# Alternatively, this file may be used under the terms of the GNU Affero #
# Public License version 3 as published by the Free Software and appearing #
# in the file LICENSE.AGPL included in the packaging of this file. #
# The BEAT platform is distributed in the hope that it will be useful, but #
# WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY #
# or FITNESS FOR A PARTICULAR PURPOSE. #
# #
# You should have received a copy of the GNU Affero Public License along #
# with the BEAT platform. If not, see http://www.gnu.org/licenses/. #
# #
###############################################################################
import numpy
import bob.ap
class Algorithm:
def __init__(self):
self.win_length_ms = 20
self.win_shift_ms = 10
self.rate = 16000
self.pre_emphasis_coef = 1.0
self.mel_scale = True
self.n_filters = 40
def setup(self, parameters):
self.rate = float(parameters.get('rate', self.rate))
self.win_length_ms = float(parameters.get('win_length_ms', self.win_length_ms))
self.win_shift_ms = float(parameters.get('win_shift_ms', self.win_shift_ms))
self.pre_emphasis_coef = float(parameters.get('pre_emphasis_coef', self.pre_emphasis_coef))
self.mel_scale = parameters.get('mel_scale', self.mel_scale)
self.n_filters = parameters.get('n_filters', self.n_filters)
return True
def process(self, inputs, outputs):
data = inputs["speech"].data.value.astype('float64')
vad_labels = inputs["labels"].data.value
# first, trim out the silences from both ends
# if VAD detection worked on this sample
if vad_labels.size == 2 and not vad_labels.all():
# we probably could not read the sample, so no labels were computed
print('VAD labels for the sample is invalid!')
else:
# take only speech frames, since in VAD speech frames are 1 and silence are 0
speech, = numpy.nonzero(vad_labels)
if len(speech) < len(vad_labels) and len(speech)>0: # trim only if necessary
nzstart=speech[0]*int(self.rate/1000*self.win_shift_ms) # index of the first non-silent frame
# make sure we count the length of non-speech shift plus the length of the last frame
nzend=(speech[-1])*int(self.rate/1000*self.win_shift_ms) + int(self.rate/1000*self.win_length_ms)
data = data[nzstart:nzend]
print("Length of trimmed sample is %s", str(data.shape))
# then, compute the spectrogram
c = bob.ap.Spectrogram(float(self.rate), float(self.win_length_ms), float(self.win_shift_ms), int(self.n_filters), 0.0, float(self.rate/2.0), float(self.pre_emphasis_coef), bool(self.mel_scale))
# energy power spectrum
c.energy_filter = True # ^2 of FFT spectrum
# we take no log
c.log_filter = False
c.energy_bands = True # band filtering
spectrogram = c(data)
outputs["data"].write({
'value':spectrogram
})
return True
The code for this algorithm in Python
The ruler at 80 columns indicate suggested POSIX line breaks (for readability).
The editor will automatically enlarge to accomodate the entirety of your input
Use keyboard shortcuts for search/replace and faster editing. For example, use Ctrl-F (PC) or Cmd-F (Mac) to search through this box
Returns trimmed-spectrogram of an audio sample. Silent start and end of a sample are trimmed using Voice Activity Detection (VAD) labels as input.
Updated | Name | Databases/Protocols | Analyzers | |||
---|---|---|---|---|---|---|
pkorshunov/pkorshunov/speech-antispoofing-baseline/1/btas2016-baseline-pa | avspoof/1@physicalaccess_antispoofing | pkorshunov/simple_antispoofing_analyzer/2 |
This table shows the number of times this algorithm has been successfully run using the given environment. Note this does not provide sufficient information to evaluate if the algorithm will run when submitted to different conditions.