Bob 2.0 LBP histograms of spectrogram bands plus bands-ratio.
Algorithms have at least one input and one output. All algorithm endpoints are organized in groups. Groups are used by the platform to indicate which inputs and outputs are synchronized together. The first group is automatically synchronized with the channel defined by the block in which the algorithm is deployed.
Endpoint Name | Data Format | Nature |
---|---|---|
labels | system/array_1d_integers/1 | Input |
speech | system/array_1d_floats/1 | Input |
features | system/array_1d_floats/1 | Output |
Parameters allow users to change the configuration of an algorithm when scheduling an experiment
Name | Description | Type | Default | Range/Choices |
---|---|---|---|---|
mel_scale | Apply Mel-scale filtering or use linear (default - linear) | bool | True | |
pre_emphasis_coef | Pre-emphasis coefficient, used in the spectrogram computation | float64 | 1.0 | |
f_max | Maximum frequency of the spectrogram | float64 | 4000.0 | |
lbp_to_average | LBP parameter. Compare the pixels to the center pixel or to the average | bool | False | |
win_shift_ms | The length of the overlap between neighboring windows. Typically the half of window length. | float64 | 10.0 | |
win_length_ms | The length of the sliding processing window, typically about 20 ms | float64 | 20.0 | |
n_lbp_histograms | Split resulted spectrogram in the number of bands and compute LBP histogram for each | uint32 | 2 | |
rate | Sampling rate of the speech signal | float64 | 16000.0 | [2000.0, 256000.0] |
n_filters | The number of filter bands used in spectrogram computation | uint32 | 40 | |
lbp_circular | LBP parameter. Extract neighbors on a circle or on a square? | bool | True | |
lbp_radius | LBP parameter. The radius of the LBP in both vertical and horizontal direction together | uint32 | 1 | [1, 10] |
lbp_neighbors | LBP parameter. Number of neighbors | uint32 | 8 | 4, 8, 16 |
lbp_uniform | LBP parameter. Only uniform LBP codes (with less than two bit-changes between 0 and 1) are considered; all other strings are combined into one LBP code | bool | False | |
lbp_elbp_type | LBP parameter. How to generate the bit strings from the pixels: regular - Choose one bit for each comparison of the neighboring pixel with the central pixel; transitional - Compare only the neighboring pixels and skip the central one; direction-coded - Compute a 2-bit code for four directions. | string | regular | regular, transitional, direction-coded |
xxxxxxxxxx
###############################################################################
# #
# Copyright (c) 2016 Idiap Research Institute, http://www.idiap.ch/ #
# Contact: beat.support@idiap.ch #
# #
# This file is part of the beat.core module of the BEAT platform. #
# #
# Commercial License Usage #
# Licensees holding valid commercial BEAT licenses may use this file in #
# accordance with the terms contained in a written agreement between you #
# and Idiap. For further information contact tto@idiap.ch #
# #
# Alternatively, this file may be used under the terms of the GNU Affero #
# Public License version 3 as published by the Free Software and appearing #
# in the file LICENSE.AGPL included in the packaging of this file. #
# The BEAT platform is distributed in the hope that it will be useful, but #
# WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY #
# or FITNESS FOR A PARTICULAR PURPOSE. #
# #
# You should have received a copy of the GNU Affero Public License along #
# with the BEAT platform. If not, see http://www.gnu.org/licenses/. #
# #
###############################################################################
import numpy
import bob.ap
import bob.sp
import bob.ip.base
import math
class Algorithm:
def __init__(self):
self.win_length_ms = 20
self.win_shift_ms = 10
self.rate = 16000
self.pre_emphasis_coef = 1.0
self.mel_scale = True
self.n_filters = 40
self.f_max = 8000
self.n_lbp_histograms = 5
self.lbp_neighbors = 16
self.lbp_to_average = False
self.lbp_elbp_type = 'regular'
self.lbp_uniform = False
self.lbp_circular = True
self.lbp_radius = 1
def setup(self, parameters):
self.rate = float(parameters.get('rate', self.rate))
self.win_length_ms = float(parameters.get('win_length_ms', self.win_length_ms))
self.win_shift_ms = float(parameters.get('win_shift_ms', self.win_shift_ms))
self.pre_emphasis_coef = float(parameters.get('pre_emphasis_coef', self.pre_emphasis_coef))
self.mel_scale = parameters.get('mel_scale', self.mel_scale)
self.n_filters = parameters.get('n_filters', self.n_filters)
self.f_max = parameters.get('f_max', self.f_max)
self.n_lbp_histograms = parameters.get('n_lbp_histograms', self.n_lbp_histograms)
self.lbp_neighbors = parameters.get('lbp_neighbors', self.lbp_neighbors)
self.lbp_to_average = parameters.get('lbp_to_average', self.lbp_to_average)
self.lbp_elbp_type = parameters.get('lbp_elbp_type', self.lbp_elbp_type)
self.lbp_uniform = parameters.get('lbp_uniform', self.lbp_uniform)
self.lbp_circular = parameters.get('lbp_circular', self.lbp_circular)
self.lbp_radius = parameters.get('lbp_radius', self.lbp_radius)
return True
def compute_spectrogram(self, data):
c = bob.ap.Spectrogram(float(self.rate), float(self.win_length_ms), float(self.win_shift_ms),
int(self.n_filters), 0.0, float(self.f_max), float(self.pre_emphasis_coef),
bool(self.mel_scale))
# energy power spectrum
c.energy_filter = True # ^2 of FFT spectrum
# we take no log
c.log_filter = True
c.energy_bands = True # band filtering
return c(data)
def compute_lbp_histograms_and_ratios(self, data):
histograms = []
ratios = []
prev_textogram = None
textogram_width = math.floor(self.n_filters/self.n_lbp_histograms)
for i in range(0, self.n_lbp_histograms):
textogram = data[:, i*textogram_width:(i+1)*textogram_width]
if prev_textogram is None:
prev_textogram = textogram
else:
ratios.append(numpy.mean(prev_textogram)/numpy.mean(textogram))
if textogram.max():
textogram *= 255.0/textogram.max()
textogram = numpy.asarray(textogram, dtype=numpy.uint8)
lbp = bob.ip.base.LBP(neighbors=int(self.lbp_neighbors), circular=bool(self.lbp_circular),
radius=int(self.lbp_radius), to_average=bool(self.lbp_to_average),
uniform=bool(self.lbp_uniform), elbp_type=self.lbp_elbp_type)
lbpimage = numpy.ndarray(lbp.lbp_shape(textogram), 'uint16') # allocating the image with lbp codes
lbp(textogram, lbpimage) # calculating the lbp image
current_hist = bob.ip.base.histogram(lbpimage, (0, lbp.max_label-1), lbp.max_label)
if sum(current_hist) != 0:
current_hist = current_hist / sum(current_hist) # histogram normalization
# reduce dimension of the features if lbp is for 16 neighbors
if self.lbp_neighbors == 16:
current_hist_fft = bob.sp.fft(numpy.asarray(current_hist, dtype=numpy.complex128))
current_hist = current_hist_fft.real[0:16] #take only first 16 frequencies of the real part
histograms.append(current_hist) # just put into the larger list
return ratios, histograms
def process(self, inputs, outputs):
data = inputs["speech"].data.value.astype('float64')
vad_labels = inputs["labels"].data.value
# first, trim out the silences from both ends
# if VAD detection worked on this sample
if vad_labels.size == 2 and not vad_labels.all():
# we probably could not read the sample, so no labels were computed
print('VAD labels for the sample is invalid!')
else:
# trim away silent head and tail
# in VAD, speech frames are 1 and silence are 0
speech, = numpy.nonzero(vad_labels)
if len(speech) and len(speech) < len(vad_labels): # trim only if necessary
nzstart = speech[0]*int(self.rate/1000*self.win_shift_ms) # index of the first non-silent frame
# make sure we count the length of non-speech shift plus the length of the last frame
nzend = (speech[-1])*int(self.rate/1000*self.win_shift_ms) + int(self.rate/1000*self.win_length_ms)
data = data[nzstart:nzend]
# compute the spectrogram
spectrogram = self.compute_spectrogram(data)
# compute LBP histograms from the spectrogram
ratios, histograms = self.compute_lbp_histograms_and_ratios(spectrogram)
features = numpy.append(ratios, histograms)
features = numpy.asarray(features, dtype=numpy.float64)
outputs["features"].write({
'value':features
})
return True
The code for this algorithm in Python
The ruler at 80 columns indicate suggested POSIX line breaks (for readability).
The editor will automatically enlarge to accomodate the entirety of your input
Use keyboard shortcuts for search/replace and faster editing. For example, use Ctrl-F (PC) or Cmd-F (Mac) to search through this box
Silent start and end of a sample are trimmed using Voice Activity Detection (VAD) labels as input.
Updated | Name | Databases/Protocols | Analyzers | |||
---|---|---|---|---|---|---|
pkorshunov/pkorshunov/isv-asv-pad-fusion-complete/1/asv_isv-pad_lbp_hist_ratios_lr-fusion_lr-pa_aligned | avspoof/2@physicalaccess_verification,avspoof/2@physicalaccess_verify_train,avspoof/2@physicalaccess_verify_train_spoof,avspoof/2@physicalaccess_antispoofing,avspoof/2@physicalaccess_verification_spoof | pkorshunov/spoof-score-fusion-roc_hist/1 | ||||
pkorshunov/pkorshunov/speech-pad-simple/1/speech-pad_lbp_hist_ratios_lr-pa_aligned | avspoof/2@physicalaccess_antispoofing | pkorshunov/simple_antispoofing_analyzer/4 |
This table shows the number of times this algorithm has been successfully run using the given environment. Note this does not provide sufficient information to evaluate if the algorithm will run when submitted to different conditions.