Python API to bob.kaldi¶

This section includes information for using the Python API of bob.kaldi.

Functions¶

bob.kaldi.get_config()[source]¶: Returns a string containing the configuration information.

bob.kaldi.cepstral(data, cepstral_type, rate=8000, preemphasis_coefficient=0.97, raw_energy=True, delta_order=2, frame_length=25, frame_shift=10, num_ceps=13, num_mel_bins=23, cepstral_lifter=22, low_freq=20, high_freq=0, dither=1.0, snip_edges=True, normalization=True)[source]¶

Computes the cepstral (mfcc/plp) features for given speech samples.

Parameters:	data (numpy.ndarray) – A 1D numpy ndarray object containing 64-bit float numbers with the audio signal to calculate the cepstral features from. The input needs to be normalized between [-1, 1]. rate (float) – The sampling rate of the input signal in `data`. cepstral_type (str) – The type of cepstral features: mfcc or plp preemphasis_coefficient (`float`, optional) – Coefficient for use in signal preemphasis raw_energy (`bool`, optional) – If true, compute energy before preemphasis and windowing delta_order (`int`, optional) – Add deltas to raw mfcc or plp features frame_length (`int`, optional) – Frame length in milliseconds frame_shift (`int`, optional) – Frame shift in milliseconds num_ceps (`int`, optional) – Number of cepstra in MFCC computation (including C0) num_mel_bins (`int`, optional) – Number of triangular mel-frequency bins cepstral_lifter (`int`, optional) – Constant that controls scaling of MFCCs low_freq (`int`, optional) – Low cutoff frequency for mel bins high_freq (`int`, optional) – High cutoff frequency for mel bins (if < 0, offset from Nyquist) dither (`float`, optional) – Dithering constant (0.0 means no dither) snip_edges (`bool`, optional) – If true, end effects will be handled by outputting only frames that completely fit in the file, and the number of frames depends on the frame-length. If false, the number of frames depends only on the frame-shift, and we reflect the data at the ends. normalization (`bool`, optional) – If true, the input samples in `data` are normalized to [-1, 1].
Returns:	The cepstral features calculated for the input signal (2D array of 32-bit floats).
Return type:	numpy.ndarray

bob.kaldi.compute_dnn_vad(samples, rate, silence_threshold=0.9, posterior=0)[source]¶

Performs Voice Activity Detection on a Kaldi feature matrix

Parameters:	feats (numpy.ndarray) – A 2-D numpy array, with log-energy being in the first component of each feature vector rate (float) – The sampling rate of the input signal in `samples`. silence_threshold (`float`, optional) – Silence threshold to be used for silence posterior evaluation. posterior (`int`, optional) – Index of posterior feature to be used for detection. Useful ones are 0, 1 and 2, for silence, laughter and noise,respectively.
Returns:	The labels [1/0] of voiced features (1D array of floats).
Return type:	numpy.ndarray

bob.kaldi.compute_vad(samples, rate, vad_energy_mean_scale=0.5, vad_energy_th=5, vad_frames_context=0, vad_proportion_th=0.6)[source]¶

Performs Voice Activity Detection on a Kaldi feature matrix

Parameters:	feats (numpy.ndarray) – A 2-D numpy array, with log-energy being in the first component of each feature vector rate (float) – The sampling rate of the input signal in `samples`. vad_energy_mean_scale (`float`, optional) – If this is set to s, to get the actual threshold we let m be the mean log-energy of the file, and use sm + vad-energy-th vad_energy_th* (`float`, optional) – Constant term in energy threshold for MFCC0 for VAD. vad_frames_context (`int`, optional) – Number of frames of context on each side of central frame, in window for which energy is monitored vad_proportion_th (`float`, optional) – Parameter controlling the proportion of frames within the window that need to have more energy than the threshold
Returns:	The labels [1/0] of voiced features (1D array of floats).
Return type:	numpy.ndarray

bob.kaldi.gmm_score(feats, spkubm, ubm)[source]¶

Print out per-frame log-likelihoods for input utterance.

Parameters:	feats (numpy.ndarray) – A 2D numpy ndarray object containing MFCCs. spkubm (str) – A text formatted Kaldi adapted global DiagGMM. ubm (str) – A text formatted Kaldi global DiagGMM.
Returns:	The average of per-frame log-likelihoods.
Return type:	float

bob.kaldi.ivector_extract(feats, fubm, ivector_extractor, num_gselect=20, min_post=0.025, posterior_scale=1.0)[source]¶

Implements Kaldi egs/sre10/v1/extract_ivectors.sh

Parameters:	feats (numpy.ndarray) – A 2D numpy ndarray object containing MFCCs. fubm (str) – A full-diagonal UBM ivector_extractor (str) – An ivector extractor model num_gselect (`int`, optional) – Number of Gaussians to keep per frame. min_post (`float`, optional) – If nonzero, posteriors below this threshold will be pruned away and the rest will be renormalized to sum to one. posterior_scale (`float`, optional) – A posterior scaling with a global scale.
Returns:	The iVectors calculated for the input signal.
Return type:	numpy.ndarray

bob.kaldi.ivector_train(feats, fubm, ivector_extractor, num_gselect=20, ivector_dim=600, use_weights=False, num_iters=5, min_post=0.025, num_samples_for_weights=3, posterior_scale=1.0)[source]¶

Implements Kaldi egs/sre10/v1/train_ivector_extractor.sh

Parameters:	feats (numpy.ndarray) – A 2D numpy ndarray object containing MFCCs. fubm (str) – A full-diagonal UBM ivector_extractor (str) – A path for the ivector extractor num_gselect (`int`, optional) – Number of Gaussians to keep per frame. ivector_dim (`int`, optional) – Dimension of iVector. use_weights (`bool`, optional) – If true, regress the log-weights on the iVector num_iters (`int`, optional) – Number of iterations of training. min_post (`float`, optional) – If nonzero, posteriors below this threshold will be pruned away and the rest will be renormalized to sum to one. num_samples_for_weights (`int`, optional) – Number of samples from iVector distribution to use for accumulating stats for weight update. Must be >1. posterior_scale (`float`, optional) – A posterior scaling with a global scale.
Returns:	A text formatted trained Kaldi IvectorExtractor.
Return type:	str

bob.kaldi.mfcc(data, rate=8000, preemphasis_coefficient=0.97, raw_energy=True, frame_length=25, frame_shift=10, num_ceps=13, num_mel_bins=23, cepstral_lifter=22, low_freq=20, high_freq=0, dither=1.0, snip_edges=True, normalization=True)[source]¶

Computes the MFCCs for given speech samples.

Parameters:	data (numpy.ndarray) – A 1D numpy ndarray object containing 64-bit float numbers with the audio signal to calculate the MFCCs from. The input needs to be normalized between [-1, 1]. rate (float) – The sampling rate of the input signal in `data`. preemphasis_coefficient (`float`, optional) – Coefficient for use in signal preemphasis raw_energy (`bool`, optional) – If true, compute energy before preemphasis and windowing frame_length (`int`, optional) – Frame length in milliseconds frame_shift (`int`, optional) – Frame shift in milliseconds num_ceps (`int`, optional) – Number of cepstra in MFCC computation (including C0) num_mel_bins (`int`, optional) – Number of triangular mel-frequency bins cepstral_lifter (`int`, optional) – Constant that controls scaling of MFCCs low_freq (`int`, optional) – Low cutoff frequency for mel bins high_freq (`int`, optional) – High cutoff frequency for mel bins (if < 0, offset from Nyquist) dither (`float`, optional) – Dithering constant (0.0 means no dither) snip_edges (`bool`, optional) – If true, end effects will be handled by outputting only frames that completely fit in the file, and the number of frames depends on the frame-length. If false, the number of frames depends only on the frame-shift, and we reflect the data at the ends. normalization (`bool`, optional) – If true, the input samples in `data` are normalized to [-1, 1].
Returns:	The MFCCs calculated for the input signal (2D array of 32-bit floats).
Return type:	numpy.ndarray

bob.kaldi.mfcc_from_path(filename, channel=0, preemphasis_coefficient=0.97, raw_energy=True, frame_length=25, frame_shift=10, num_ceps=13, num_mel_bins=23, cepstral_lifter=22, low_freq=20, high_freq=0, dither=1.0, snip_edges=True)[source]¶

Computes the MFCCs for a given input signal recorded into a file

Parameters:	filename (str) – A path to a valid WAV or NIST Sphere file to read data from channel (int) – The audio channel to read from inside the file preemphasis_coefficient (`float`, optional) – Coefficient for use in signal preemphasis raw_energy (`bool`, optional) – If true, compute energy before preemphasis and windowing frame_length (`int`, optional) – Frame length in milliseconds frame_shift (`int`, optional) – Frame shift in milliseconds num_ceps (`int`, optional) – Number of cepstra in MFCC computation (including C0) num_mel_bins (`int`, optional) – Number of triangular mel-frequency bins cepstral_lifter (`int`, optional) – Constant that controls scaling of MFCCs low_freq (`int`, optional) – Low cutoff frequency for mel bins high_freq (`int`, optional) – High cutoff frequency for mel bins (if < 0, offset from Nyquist) dither (`float`, optional) – Dithering constant (0.0 means no dither) snip_edges (`bool`, optional) – If true, end effects will be handled by outputting only frames that completely fit in the file, and the number of frames depends on the frame-length. If false, the number of frames depends only on the frame-shift, and we reflect the data at the ends
Returns:	The MFCCs calculated for the input signal (2D array of 32-bit floats).
Return type:	numpy.ndarray

bob.kaldi.nnet_forward(feats, nnet, feats_transform='', apply_log=False, no_softmax=False, prior_floor=1e-10, prior_scale=1, use_gpu=False)[source]¶

Computes the forward pass for given features.

Parameters:	feats (numpy.ndarray) – The input cepstral features (2D array of 32-bit floats). nnet (str) – The neural network feats_transform (`str`, optional) – The input feature transform for `feats`. apply_log (`bool`, optional) – Transform NN output by log(). no_softmax (`bool`, optional) – Removes the last component with Softmax. prior_floor (`float`, optional) – Flooring constant for prior probability. prior_scale (`float`, optional) – Scaling factor to be applied on pdf-log-priors. use_gpu (`bool`, optional) – Compute forward pass on GPU.
Returns:	The posterior features.
Return type:	numpy.ndarray

bob.kaldi.plda_enroll(feats, pldamean)[source]¶

Implements Kaldi egs/sre10/v1/plda_scoring.sh

Parameters:	feats (numpy.ndarray) – A 2D numpy ndarray object containing iVectors (of a single speaker). pldamean (str) – A path to the global PLDA mean file
Returns:	A path to enrolled PLDA model (average iVectors).
Return type:	str

bob.kaldi.plda_score(feats, model, plda, globalmean, smoothing=0)[source]¶

Implements Kaldi egs/sre10/v1/plda_scoring.sh

Parameters:	feats (numpy.ndarray) – A 2D numpy ndarray object containing iVectors. model (str) – A speaker model (average iVectors). plda (str) – A PLDA model. globalmean (str) – A global PLDA mean. smoothing (float) – Factor used in smoothing within-class covariance (add this factor times between-class covar).
Returns:	A PLDA score.
Return type:	float

bob.kaldi.plda_train(feats, plda_file, mean_file)[source]¶

Implements Kaldi egs/sre10/v1/plda_scoring.sh

Parameters:	feats (numpy.ndarray) – A 2D numpy ndarray object containing MFCCs. plda_file (str) – A path to the trained PLDA model mean_file (str) – A path to the global PLDA mean file
Returns:	Trained PLDA model and global mean (2D str array)
Return type:	str

bob.kaldi.ubm_enroll(feats, ubm)[source]¶

Performes MAP adaptation of GMM-UBM model.

Parameters:	feats (numpy.ndarray) – A 2D numpy ndarray object containing MFCCs. ubm (str) – A text formatted Kaldi global DiagGMM.
Returns:	A text formatted Kaldi enrolled DiagGMM.
Return type:	str

bob.kaldi.ubm_full_train(feats, dubm, fubmfile, num_gselect=20, num_iters=4, min_gaussian_weight=0.0001)[source]¶

Implements Kaldi egs/sre10/v1/train_full_ubm.sh

Parameters:	feats (numpy.ndarray) – A 2D numpy ndarray object containing MFCCs. dubm (str) – A text formatted trained Kaldi global DiagGMM model. fubmfile (str) – A path to the full covariance UBM model. num_gselect (`int`, optional) – Number of Gaussians to keep per frame. num_iters (`int`, optional) – Number of iterations of training. min_gaussian_weight (`float`, optional) – Kaldi MleDiagGmmOptions: Min Gaussian weight before we remove it.
Returns:	A path to the full covariance UBM model.
Return type:	str

bob.kaldi.ubm_train(feats, ubmname, num_threads=4, num_frames=500000, min_gaussian_weight=0.0001, num_gauss=2048, num_gauss_init=0, num_gselect=30, num_iters_init=20, num_iters=4, remove_low_count_gaussians=True)[source]¶

Implements Kaldi egs/sre10/v1/train_diag_ubm.sh

Parameters:	feats (numpy.ndarray) – A 2D numpy ndarray object containing MFCCs. ubmname (str) – A path to the UBM model. num_threads (`int`, optional) – Number of threads used for statistics accumulation. num_frames (`int`, optional) – Number of feature vectors to store in memory and train on (randomly chosen from the input features). min_gaussian_weight (`float`, optional) – Kaldi MleDiagGmmOptions: Min Gaussian weight before we remove it. num_gauss (`int`, optional) – Number of Gaussians in the model. num_gauss_init (`int`, optional) – Number of Gaussians in the model initially (if nonzero and less than num_gauss, we’ll do mixture splitting). num_gselect (`int`, optional) – Number of Gaussians to keep per frame. num_iters_init (`int`, optional) – Number of iterations of training for initialization of the single diagonal GMM. num_iters (`int`, optional) – Number of iterations of training. remove_low_count_gaussians (`bool`, optional) – Kaldi MleDiagGmmOptions: If true, remove Gaussians that fall below the floors.
Returns:	A text formatted trained Kaldi global DiagGMM model.
Return type:	str