Bob 2.0 extraction of cepstral features (MFCC or LFCC) from audio

This algorithm is a legacy one. The API has changed since its implementation. New versions and forks will need to be updated.
This algorithm is splittable

Algorithms have at least one input and one output. All algorithm endpoints are organized in groups. Groups are used by the platform to indicate which inputs and outputs are synchronized together. The first group is automatically synchronized with the channel defined by the block in which the algorithm is deployed.

Group: main

Endpoint Name Data Format Nature
speech system/array_1d_floats/1 Input
vad system/array_1d_integers/1 Input
features system/array_2d_floats/1 Output

Parameters allow users to change the configuration of an algorithm when scheduling an experiment

Name Description Type Default Range/Choices
f_max Max frequency of the range used in bandpass filtering float64 8000.0
delta_win Window size used in delta and delta-delta computation uint32 2
withDelta Compute deltas (with window size specified by delta_win) bool True
pre_emphasis_coef Pre-emphasis coefficient float64 0.95
win_shift_ms The length of the overlap between neighboring windows. Typically the half of window length. float64 10.0
win_length_ms The length of the sliding processing window, typically about 20 ms float64 20.0
dct_norm Use normalized DCT bool False
normalizeFeatures Normalize computed Cepstral features (shift by mean and divide by std) bool True
filter_frames Filter frames with computed Cepstral features based on the VAD labels. Either trim out silence head/tails, keep only speech, or keep only silence. string trim_silence trim_silence, silence_only, speech_only
rate Sampling rate of the speech signal float64 16000.0
n_filters Number of filter bands uint32 24
f_min Min frequency of the range used in bandpass filtering float64 0.0
withDeltaDelta Compute delta-deltas (with window size specified by delta_win) bool True
withEnergy Use power of the FFT magnitude, otherwise just an absolute value of the magnitude bool True
mel_scale Set true to use Mel-scaled triangular filter, otherwise it's a linear scale bool True
n_ceps Number of cepstral coefficients uint32 19
xxxxxxxxxx
150
 
1
import numpy
2
import bob.ap
3
4
def vad_filter_features(rate, wavsample, vad_labels, features, filter_frames="trim_silence"):
5
    """
6
    @param: filter_frames: the value is either 'silence_only' (keep the speech, remove everything else), 'speech_only' (keep all the silence only), or 'trim_silence' (time silent heads and tails)
7
    """
8
9
    if not wavsample.size:
10
        raise ValueError("vad_filter_features(): data sample is empty, no features extraction is possible")
11
12
    vad_labels = numpy.asarray(vad_labels, dtype=numpy.int8)
13
    features = numpy.asarray(features, dtype=numpy.float64)
14
    features = numpy.reshape(features, (vad_labels.shape[0], -1))
15
16
    # first, take the whole thing, in case there are problems later
17
    filtered_features = features
18
19
    # if VAD detection worked on this sample
20
    if vad_labels is not None:
21
        # make sure the size of VAD labels and sectrogram lenght match
22
        if len(vad_labels) == len(features):
23
24
            # take only speech frames, as in VAD speech frames are 1 and silence are 0
25
            speech, = numpy.nonzero(vad_labels)
26
            silences = None
27
            if filter_frames == "silence_only":
28
                # take only silent frames - those for which VAD gave zeros
29
                silences, = numpy.nonzero(vad_labels==0)
30
31
            if len(speech):
32
                nzstart=speech[0] # index of the first non-zero
33
                nzend=speech[-1] # index of the last non-zero
34
35
                if filter_frames == "silence_only": # extract only silent frames
36
                    # take only silent frames in-between the speech
37
                    silences=silences[silences > nzstart]
38
                    silences=silences[silences < nzend]
39
                    filtered_features = features[silences, :]
40
                elif filter_frames == "speech_only":
41
                    filtered_features = features[speech, :]
42
                else: # when we take all
43
                    filtered_features = features[nzstart:nzend, :]
44
        else:
45
            print("Warning: vad_filter_features(): VAD labels should be the same length as energy bands")
46
47
#    print("vad_filter_features(): filtered_features shape: %s" % str(filtered_features.shape))
48
49
    return filtered_features
50
51
class Algorithm:
52
53
    def __init__(self):
54
        self.rate = 16000
55
        self.win_length_ms = 20
56
        self.win_shift_ms = 10
57
        self.n_filters = 24
58
        self.n_ceps = 19
59
        self.f_min = 0
60
        self.f_max = 4000
61
        self.delta_win = 2
62
        self.pre_emphasis_coef = 0.95
63
        self.dct_norm = False
64
        self.mel_scale = True
65
        self.withEnergy = True
66
        self.withDelta = True
67
        self.withDeltaDelta = True
68
        self.normalizeFeatures = True
69
    
70
        self.filter_frames = 'speech_only'
71
        self.features_len = 19
72
73
74
    def setup(self, parameters):
75
        self.rate = float(parameters.get('rate', self.rate))
76
        self.win_length_ms = float(parameters.get('win_length_ms', self.win_length_ms))
77
        self.win_shift_ms = float(parameters.get('win_shift_ms', self.win_shift_ms))
78
        self.f_min = float(parameters.get('f_min', self.f_min))
79
        self.f_max = float(parameters.get('f_max', self.f_max))
80
81
        self.n_ceps = parameters.get('n_ceps', self.n_ceps)
82
        self.n_filters = parameters.get('n_filters', self.n_filters)
83
        self.delta_win = parameters.get('delta_win', self.delta_win)
84
        
85
        self.pre_emphasis_coef = float(parameters.get('pre_emphasis_coef', self.pre_emphasis_coef))
86
        self.mel_scale = parameters.get('mel_scale', self.mel_scale)
87
        self.dct_norm = parameters.get('dct_norm', self.dct_norm)
88
        self.withEnergy = parameters.get('withEnergy', self.withEnergy)
89
        self.withDelta = parameters.get('withDelta', self.withDelta)
90
        self.withDeltaDelta = parameters.get('withDeltaDelta', self.withDeltaDelta)
91
        self.normalizeFeatures = parameters.get('normalizeFeatures', self.normalizeFeatures)
92
93
        self.filter_frames = parameters.get('filter_frames', self.filter_frames)
94
        
95
        wl = self.win_length_ms
96
        ws = self.win_shift_ms
97
        nf = self.n_filters
98
        nc = self.n_ceps
99
        f_min = self.f_min
100
        f_max = self.f_max
101
        dw = self.delta_win
102
        pre = self.pre_emphasis_coef
103
104
        self.extractor = bob.ap.Ceps(self.rate, wl, ws, nf, nc, f_min, f_max, dw, pre)
105
        self.extractor.dct_norm = self.dct_norm
106
        self.extractor.mel_scale = self.mel_scale
107
        self.extractor.with_energy = self.withEnergy
108
        self.extractor.with_delta = self.withDelta
109
        self.extractor.with_delta_delta = self.withDeltaDelta
110
111
        # compute the size of the feature vector
112
        self.features_len = nc
113
        if self.withDelta:
114
            self.features_len += nc
115
        if self.withDeltaDelta:
116
            self.features_len += nc
117
118
        return True
119
120
    def normalize_features(self, features):
121
        mean = numpy.mean(features, axis=0)
122
        std = numpy.std(features, axis=0)
123
        features = numpy.divide(features-mean, std)
124
        return features
125
126
127
    def process(self, inputs, outputs):
128
129
        float_wav = inputs["speech"].data.value.astype('float64')
130
        labels = inputs["vad"].data.value
131
132
        cepstral_features = self.extractor(float_wav)
133
134
        filtered_features = vad_filter_features(self.rate, float_wav, labels, cepstral_features, filter_frames=self.filter_frames)
135
136
        if self.normalizeFeatures:
137
            normalized_features = self.normalize_features(filtered_features)
138
        else:
139
            normalized_features = filtered_features
140
141
        if normalized_features.shape[0] == 0:
142
            # If they are zero, do not keep it empty!!! This avoids errors in next steps
143
            normalized_features=numpy.array([numpy.zeros(self.features_len)])
144
145
        outputs["features"].write({
146
            'value': numpy.vstack(normalized_features)
147
        })
148
149
        return True
150

The code for this algorithm in Python
The ruler at 80 columns indicate suggested POSIX line breaks (for readability).
The editor will automatically enlarge to accomodate the entirety of your input
Use keyboard shortcuts for search/replace and faster editing. For example, use Ctrl-F (PC) or Cmd-F (Mac) to search through this box

Extract cepstral features (MFCC or LFCC) from audio

Experiments

Updated Name Databases/Protocols Analyzers
pkorshunov/pkorshunov/isv-asv-pad-fusion-complete/1/asv_isv-pad_lbp_hist_ratios_lr-fusion_lr-pa_aligned avspoof/2@physicalaccess_verification,avspoof/2@physicalaccess_verify_train,avspoof/2@physicalaccess_verify_train_spoof,avspoof/2@physicalaccess_antispoofing,avspoof/2@physicalaccess_verification_spoof pkorshunov/spoof-score-fusion-roc_hist/1
pkorshunov/pkorshunov/isv-asv-pad-fusion-complete/1/asv_isv-pad_gmm-fusion_lr-pa avspoof/2@physicalaccess_verification,avspoof/2@physicalaccess_verify_train,avspoof/2@physicalaccess_verify_train_spoof,avspoof/2@physicalaccess_antispoofing,avspoof/2@physicalaccess_verification_spoof pkorshunov/spoof-score-fusion-roc_hist/1
pkorshunov/pkorshunov/speech-pad-simple/1/speech-pad_gmm-pa avspoof/2@physicalaccess_antispoofing pkorshunov/simple_antispoofing_analyzer/4
pkorshunov/pkorshunov/isv-speaker-verification-spoof/1/isv-speaker-verification-spoof-pa avspoof/2@physicalaccess_verification,avspoof/2@physicalaccess_verification_spoof pkorshunov/eerhter_postperf_iso_spoof/1
pkorshunov/pkorshunov/isv-speaker-verification/1/isv-speaker-verification-licit avspoof/2@physicalaccess_verification pkorshunov/eerhter_postperf_iso/1
Created with Raphaël 2.1.2[compare]pkorshunov/cepstral/12016Mar15

This table shows the number of times this algorithm has been successfully run using the given environment. Note this does not provide sufficient information to evaluate if the algorithm will run when submitted to different conditions.

Terms of Service | Contact Information | BEAT platform version 2.2.1b0 | © Idiap Research Institute - 2013-2025