BEAT - pkorshunov/cepstral/1

This algorithm is a legacy one. The API has changed since its implementation. New versions and forks will need to be updated.

This algorithm is splittable

Endpoint Groups 1

Algorithms have at least one input and one output. All algorithm endpoints are organized in groups. Groups are used by the platform to indicate which inputs and outputs are synchronized together. The first group is automatically synchronized with the channel defined by the block in which the algorithm is deployed.

Group: main

Endpoint Name	Data Format	Nature
speech	system/array_1d_floats/1	Input
vad	system/array_1d_integers/1	Input
features	system/array_2d_floats/1	Output

Parameters 16

Parameters allow users to change the configuration of an algorithm when scheduling an experiment

Name	Description	Type	Default	Range/Choices
f_max	Max frequency of the range used in bandpass filtering	float64	8000.0
delta_win	Window size used in delta and delta-delta computation	uint32	2
withDelta	Compute deltas (with window size specified by delta_win)	bool	True
pre_emphasis_coef	Pre-emphasis coefficient	float64	0.95
win_shift_ms	The length of the overlap between neighboring windows. Typically the half of window length.	float64	10.0
win_length_ms	The length of the sliding processing window, typically about 20 ms	float64	20.0
dct_norm	Use normalized DCT	bool	False
normalizeFeatures	Normalize computed Cepstral features (shift by mean and divide by std)	bool	True
filter_frames	Filter frames with computed Cepstral features based on the VAD labels. Either trim out silence head/tails, keep only speech, or keep only silence.	string	trim_silence	trim_silence, silence_only, speech_only
rate	Sampling rate of the speech signal	float64	16000.0
n_filters	Number of filter bands	uint32	24
f_min	Min frequency of the range used in bandpass filtering	float64	0.0
withDeltaDelta	Compute delta-deltas (with window size specified by delta_win)	bool	True
withEnergy	Use power of the FFT magnitude, otherwise just an absolute value of the magnitude	bool	True
mel_scale	Set true to use Mel-scaled triangular filter, otherwise it's a linear scale	bool	True
n_ceps	Number of cepstral coefficients	uint32	19

import numpy
import bob.ap

def vad_filter_features(rate, wavsample, vad_labels, features, filter_frames="trim_silence"):
    """
    @param: filter_frames: the value is either 'silence_only' (keep the speech, remove everything else), 'speech_only' (keep all the silence only), or 'trim_silence' (time silent heads and tails)
    """

if not wavsample.size:
        raise ValueError("vad_filter_features(): data sample is empty, no features extraction is possible")

vad_labels = numpy.asarray(vad_labels, dtype=numpy.int8)
    features = numpy.asarray(features, dtype=numpy.float64)
    features = numpy.reshape(features, (vad_labels.shape[0], -1))

# first, take the whole thing, in case there are problems later
    filtered_features = features

# if VAD detection worked on this sample
    if vad_labels is not None:
        # make sure the size of VAD labels and sectrogram lenght match
        if len(vad_labels) == len(features):

# take only speech frames, as in VAD speech frames are 1 and silence are 0
            speech, = numpy.nonzero(vad_labels)
            silences = None
            if filter_frames == "silence_only":
                # take only silent frames - those for which VAD gave zeros
                silences, = numpy.nonzero(vad_labels==0)

if len(speech):
                nzstart=speech[0] # index of the first non-zero
                nzend=speech[-1] # index of the last non-zero

if filter_frames == "silence_only": # extract only silent frames
                    # take only silent frames in-between the speech
                    silences=silences[silences > nzstart]
                    silences=silences[silences < nzend]
                    filtered_features = features[silences, :]
                elif filter_frames == "speech_only":
                    filtered_features = features[speech, :]
                else: # when we take all
                    filtered_features = features[nzstart:nzend, :]
        else:
            print("Warning: vad_filter_features(): VAD labels should be the same length as energy bands")

#    print("vad_filter_features(): filtered_features shape: %s" % str(filtered_features.shape))

return filtered_features

class Algorithm:

def __init__(self):
        self.rate = 16000
        self.win_length_ms = 20
        self.win_shift_ms = 10
        self.n_filters = 24
        self.n_ceps = 19
        self.f_min = 0
        self.f_max = 4000
        self.delta_win = 2
        self.pre_emphasis_coef = 0.95
        self.dct_norm = False
        self.mel_scale = True
        self.withEnergy = True
        self.withDelta = True
        self.withDeltaDelta = True
        self.normalizeFeatures = True
    
        self.filter_frames = 'speech_only'
        self.features_len = 19

def setup(self, parameters):
        self.rate = float(parameters.get('rate', self.rate))
        self.win_length_ms = float(parameters.get('win_length_ms', self.win_length_ms))
        self.win_shift_ms = float(parameters.get('win_shift_ms', self.win_shift_ms))
        self.f_min = float(parameters.get('f_min', self.f_min))
        self.f_max = float(parameters.get('f_max', self.f_max))

self.n_ceps = parameters.get('n_ceps', self.n_ceps)
        self.n_filters = parameters.get('n_filters', self.n_filters)
        self.delta_win = parameters.get('delta_win', self.delta_win)
        
        self.pre_emphasis_coef = float(parameters.get('pre_emphasis_coef', self.pre_emphasis_coef))
        self.mel_scale = parameters.get('mel_scale', self.mel_scale)
        self.dct_norm = parameters.get('dct_norm', self.dct_norm)
        self.withEnergy = parameters.get('withEnergy', self.withEnergy)
        self.withDelta = parameters.get('withDelta', self.withDelta)
        self.withDeltaDelta = parameters.get('withDeltaDelta', self.withDeltaDelta)
        self.normalizeFeatures = parameters.get('normalizeFeatures', self.normalizeFeatures)

self.filter_frames = parameters.get('filter_frames', self.filter_frames)
        
        wl = self.win_length_ms
        ws = self.win_shift_ms
        nf = self.n_filters
        nc = self.n_ceps
        f_min = self.f_min
        f_max = self.f_max
        dw = self.delta_win
        pre = self.pre_emphasis_coef

self.extractor = bob.ap.Ceps(self.rate, wl, ws, nf, nc, f_min, f_max, dw, pre)
        self.extractor.dct_norm = self.dct_norm
        self.extractor.mel_scale = self.mel_scale
        self.extractor.with_energy = self.withEnergy
        self.extractor.with_delta = self.withDelta
        self.extractor.with_delta_delta = self.withDeltaDelta

# compute the size of the feature vector
        self.features_len = nc
        if self.withDelta:
            self.features_len += nc
        if self.withDeltaDelta:
            self.features_len += nc

return True

def normalize_features(self, features):
        mean = numpy.mean(features, axis=0)
        std = numpy.std(features, axis=0)
        features = numpy.divide(features-mean, std)
        return features

def process(self, inputs, outputs):

float_wav = inputs["speech"].data.value.astype('float64')
        labels = inputs["vad"].data.value

cepstral_features = self.extractor(float_wav)

filtered_features = vad_filter_features(self.rate, float_wav, labels, cepstral_features, filter_frames=self.filter_frames)

if self.normalizeFeatures:
            normalized_features = self.normalize_features(filtered_features)
        else:
            normalized_features = filtered_features

if normalized_features.shape[0] == 0:
            # If they are zero, do not keep it empty!!! This avoids errors in next steps
            normalized_features=numpy.array([numpy.zeros(self.features_len)])

outputs["features"].write({
            'value': numpy.vstack(normalized_features)
        })

return True

xxxxxxxxxx
 
import numpy
import bob.ap
​
def vad_filter_features(rate, wavsample, vad_labels, features, filter_frames="trim_silence"):
    """
    @param: filter_frames: the value is either 'silence_only' (keep the speech, remove everything else), 'speech_only' (keep all the silence only), or 'trim_silence' (time silent heads and tails)
    """
​
    if not wavsample.size:
        raise ValueError("vad_filter_features(): data sample is empty, no features extraction is possible")
​
    vad_labels = numpy.asarray(vad_labels, dtype=numpy.int8)
    features = numpy.asarray(features, dtype=numpy.float64)
    features = numpy.reshape(features, (vad_labels.shape[0], -1))
​
    # first, take the whole thing, in case there are problems later
    filtered_features = features
​
    # if VAD detection worked on this sample
    if vad_labels is not None:
        # make sure the size of VAD labels and sectrogram lenght match
        if len(vad_labels) == len(features):
​
            # take only speech frames, as in VAD speech frames are 1 and silence are 0
            speech, = numpy.nonzero(vad_labels)
            silences = None
            if filter_frames == "silence_only":
                # take only silent frames - those for which VAD gave zeros
                silences, = numpy.nonzero(vad_labels==0)
​
            if len(speech):
                nzstart=speech[0] # index of the first non-zero
                nzend=speech[-1] # index of the last non-zero
​
                if filter_frames == "silence_only": # extract only silent frames
                    # take only silent frames in-between the speech
                    silences=silences[silences > nzstart]
                    silences=silences[silences < nzend]
                    filtered_features = features[silences, :]
                elif filter_frames == "speech_only":
                    filtered_features = features[speech, :]
                else: # when we take all
                    filtered_features = features[nzstart:nzend, :]
        else:
            print("Warning: vad_filter_features(): VAD labels should be the same length as energy bands")
​
#    print("vad_filter_features(): filtered_features shape: %s" % str(filtered_features.shape))
​
    return filtered_features
​
class Algorithm:
​
    def __init__(self):
        self.rate = 16000
        self.win_length_ms = 20
        self.win_shift_ms = 10
        self.n_filters = 24
        self.n_ceps = 19
        self.f_min = 0
        self.f_max = 4000
        self.delta_win = 2
        self.pre_emphasis_coef = 0.95
        self.dct_norm = False
        self.mel_scale = True
        self.withEnergy = True
        self.withDelta = True
        self.withDeltaDelta = True
        self.normalizeFeatures = True
    
        self.filter_frames = 'speech_only'
        self.features_len = 19
​
​
    def setup(self, parameters):
        self.rate = float(parameters.get('rate', self.rate))
        self.win_length_ms = float(parameters.get('win_length_ms', self.win_length_ms))
        self.win_shift_ms = float(parameters.get('win_shift_ms', self.win_shift_ms))
        self.f_min = float(parameters.get('f_min', self.f_min))
        self.f_max = float(parameters.get('f_max', self.f_max))
​
        self.n_ceps = parameters.get('n_ceps', self.n_ceps)
        self.n_filters = parameters.get('n_filters', self.n_filters)
        self.delta_win = parameters.get('delta_win', self.delta_win)
        
        self.pre_emphasis_coef = float(parameters.get('pre_emphasis_coef', self.pre_emphasis_coef))
        self.mel_scale = parameters.get('mel_scale', self.mel_scale)
        self.dct_norm = parameters.get('dct_norm', self.dct_norm)
        self.withEnergy = parameters.get('withEnergy', self.withEnergy)
        self.withDelta = parameters.get('withDelta', self.withDelta)
        self.withDeltaDelta = parameters.get('withDeltaDelta', self.withDeltaDelta)
        self.normalizeFeatures = parameters.get('normalizeFeatures', self.normalizeFeatures)
​
        self.filter_frames = parameters.get('filter_frames', self.filter_frames)
        
        wl = self.win_length_ms
        ws = self.win_shift_ms
        nf = self.n_filters
        nc = self.n_ceps
        f_min = self.f_min
        f_max = self.f_max
        dw = self.delta_win
        pre = self.pre_emphasis_coef
​
        self.extractor = bob.ap.Ceps(self.rate, wl, ws, nf, nc, f_min, f_max, dw, pre)
        self.extractor.dct_norm = self.dct_norm
        self.extractor.mel_scale = self.mel_scale
        self.extractor.with_energy = self.withEnergy
        self.extractor.with_delta = self.withDelta
        self.extractor.with_delta_delta = self.withDeltaDelta
​
        # compute the size of the feature vector
        self.features_len = nc
        if self.withDelta:
            self.features_len += nc
        if self.withDeltaDelta:
            self.features_len += nc
​
        return True
​
    def normalize_features(self, features):
        mean = numpy.mean(features, axis=0)
        std = numpy.std(features, axis=0)
        features = numpy.divide(features-mean, std)
        return features
​
​
    def process(self, inputs, outputs):
​
        float_wav = inputs["speech"].data.value.astype('float64')
        labels = inputs["vad"].data.value
​
        cepstral_features = self.extractor(float_wav)
​
        filtered_features = vad_filter_features(self.rate, float_wav, labels, cepstral_features, filter_frames=self.filter_frames)
​
        if self.normalizeFeatures:
            normalized_features = self.normalize_features(filtered_features)
        else:
            normalized_features = filtered_features
​
        if normalized_features.shape[0] == 0:
            # If they are zero, do not keep it empty!!! This avoids errors in next steps
            normalized_features=numpy.array([numpy.zeros(self.features_len)])
​
        outputs["features"].write({
            'value': numpy.vstack(normalized_features)
        })
​
        return True
​

The code for this algorithm in Python
The ruler at 80 columns indicate suggested POSIX line breaks (for readability).
The editor will automatically enlarge to accomodate the entirety of your input
Use keyboard shortcuts for search/replace and faster editing. For example, use Ctrl-F (PC) or Cmd-F (Mac) to search through this box

Extract cepstral features (MFCC or LFCC) from audio

Experiments

Attestation:

Privacy:

Status:

Name	Databases/Protocols	Analyzers
pkorshunov/pkorshunov/isv-asv-pad-fusion-complete/1/asv_isv-pad_lbp_hist_ratios_lr-fusion_lr-pa_aligned	avspoof/2@physicalaccess_verification,avspoof/2@physicalaccess_verify_train,avspoof/2@physicalaccess_verify_train_spoof,avspoof/2@physicalaccess_antispoofing,avspoof/2@physicalaccess_verification_spoof	pkorshunov/spoof-score-fusion-roc_hist/1
pkorshunov/pkorshunov/isv-asv-pad-fusion-complete/1/asv_isv-pad_gmm-fusion_lr-pa	avspoof/2@physicalaccess_verification,avspoof/2@physicalaccess_verify_train,avspoof/2@physicalaccess_verify_train_spoof,avspoof/2@physicalaccess_antispoofing,avspoof/2@physicalaccess_verification_spoof	pkorshunov/spoof-score-fusion-roc_hist/1
pkorshunov/pkorshunov/speech-pad-simple/1/speech-pad_gmm-pa	avspoof/2@physicalaccess_antispoofing	pkorshunov/simple_antispoofing_analyzer/4
pkorshunov/pkorshunov/isv-speaker-verification-spoof/1/isv-speaker-verification-spoof-pa	avspoof/2@physicalaccess_verification,avspoof/2@physicalaccess_verification_spoof	pkorshunov/eerhter_postperf_iso_spoof/1
pkorshunov/pkorshunov/isv-speaker-verification/1/isv-speaker-verification-licit	avspoof/2@physicalaccess_verification	pkorshunov/eerhter_postperf_iso/1

This table shows the number of times this algorithm has been successfully run using the given environment. Note this does not provide sufficient information to evaluate if the algorithm will run when submitted to different conditions.

algorithms pkorshunov cepstral 1

Endpoint Groups 1

Group: main

Parameters 16

Experiments

algorithms

pkorshunov

cepstral

1