bob.med.tb.utils.measure¶

Functions

`base_measures`(tp, fp, tn, fn)	Calculates measures from true/false positive and negative counts
`bayesian_measures`(tp, fp, tn, fn, lambda_, ...)	Calculates mean and mode from true/false positive and negative counts with credible regions
`beta_credible_region`(k, l, lambda_, coverage)	Returns the mode, upper and lower bounds of the equal-tailed credible region of a probability estimate following Bernoulli trials.
`get_centered_maxf1`(f1_scores, thresholds)	Return the centered max F1 score threshold when multiple threshold give the same max F1 score
`tricky_division`(n, d)	Divides n by d.

Classes

SmoothedValue([window_size])

Track a series of values and provide access to smoothed values over a window or the global series average.

class bob.med.tb.utils.measure.SmoothedValue(window_size=20)[source]¶

Bases: object

Track a series of values and provide access to smoothed values over a window or the global series average.

update(value)[source]¶

property median¶

property avg¶

bob.med.tb.utils.measure.tricky_division(n, d)[source]¶: Divides n by d. Returns 0.0 in case of a division by zero

bob.med.tb.utils.measure.base_measures(tp, fp, tn, fn)[source]¶

Calculates measures from true/false positive and negative counts

This function can return standard machine learning measures from true and false positive counts of positives and negatives. For a thorough look into these and alternate names for the returned values, please check Wikipedia’s entry on Precision and Recall.

Parameters

tp (int) – True positive count, AKA “hit”
fp (int) – False positive count, AKA, “correct rejection”
tn (int) – True negative count, AKA “false alarm”, or “Type I error”
fn (int) – False Negative count, AKA “miss”, or “Type II error”

Returns

precision (float) – P, AKA positive predictive value (PPV). It corresponds arithmetically to tp/(tp+fp). In the case tp+fp == 0, this function returns zero for precision.
recall (float) – R, AKA sensitivity, hit rate, or true positive rate (TPR). It corresponds arithmetically to tp/(tp+fn). In the special case where tp+fn == 0, this function returns zero for recall.
specificity (float) – S, AKA selectivity or true negative rate (TNR). It corresponds arithmetically to tn/(tn+fp). In the special case where tn+fp == 0, this function returns zero for specificity.
accuracy (float) – A, see Accuracy. is the proportion of correct predictions (both true positives and true negatives) among the total number of pixels examined. It corresponds arithmetically to (tp+tn)/(tp+tn+fp+fn). This measure includes both true-negatives and positives in the numerator, what makes it sensitive to data or regions without annotations.
jaccard (float) – J, see Jaccard Index or Similarity. It corresponds arithmetically to tp/(tp+fp+fn). In the special case where tn+fp+fn == 0, this function returns zero for the Jaccard index. The Jaccard index depends on a TP-only numerator, similarly to the F1 score. For regions where there are no annotations, the Jaccard index will always be zero, irrespective of the model output. Accuracy may be a better proxy if one needs to consider the true abscence of annotations in a region as part of the measure.
f1_score (float) – F1, see F1-score. It corresponds arithmetically to 2*P*R/(P+R) or 2*tp/(2*tp+fp+fn). In the special case where P+R == (2*tp+fp+fn) == 0, this function returns zero for the Jaccard index. The F1 or Dice score depends on a TP-only numerator, similarly to the Jaccard index. For regions where there are no annotations, the F1-score will always be zero, irrespective of the model output. Accuracy may be a better proxy if one needs to consider the true abscence of annotations in a region as part of the measure.

bob.med.tb.utils.measure.beta_credible_region(k, l, lambda_, coverage)[source]¶

Returns the mode, upper and lower bounds of the equal-tailed credible region of a probability estimate following Bernoulli trials.

This implemetnation is based on [GOUTTE-2005]. It assumes $k$ successes and $l$ failures ( $n = k + l$ total trials) are issued from a series of Bernoulli trials (likelihood is binomial). The posterior is derivated using the Bayes Theorem with a beta prior. As there is no reason to favour high vs. low precision, we use a symmetric Beta prior ( $α = β$ ):

Math input error

The mode for this posterior (also the maximum a posteriori) is:

mode (p) = \frac{k + λ - 1}{n + 2 λ - 2}

Concretely, the prior may be flat (all rates are equally likely, $λ = 1$ ) or we may use Jeoffrey’s prior ( $λ = 0.5$ ), that is invariant through re-parameterisation. Jeffrey’s prior indicate that rates close to zero or one are more likely.

The mode above works if $k + α, n - k + α > 1$ , which is usually the case for a resonably well tunned system, with more than a few samples for analysis. In the limit of the system performance, $k$ may be 0, which will make the mode become zero.

For our purposes, it may be more suitable to represent $n = k + l$ , with $k$ , the number of successes and $l$ , the number of failures in the binomial experiment, and find this more suitable representation:

\begin{aligned} P (p | k, l) & = \frac{1}{B (k + α, l + α)} p^{k + α - 1} (1 - p)^{l + α - 1} \\ mode (p) & = \frac{k + λ - 1}{k + l + 2 λ - 2} \end{aligned}

This can be mapped to most rates calculated in the context of binary classification this way:

Precision or Positive-Predictive Value (PPV): p = TP/(TP+FP), so k=TP, l=FP
Recall, Sensitivity, or True Positive Rate: r = TP/(TP+FN), so k=TP, l=FN
Specificity or True Negative Rage: s = TN/(TN+FP), so k=TN, l=FP
F1-score: f1 = 2TP/(2TP+FP+FN), so k=2TP, l=FP+FN
Accuracy: acc = TP+TN/(TP+TN+FP+FN), so k=TP+TN, l=FP+FN
Jaccard: j = TP/(TP+FP+FN), so k=TP, l=FP+FN

Contrary to frequentist approaches, in which one can only say that if the test were repeated an infinite number of times, and one constructed a confidence interval each time, then X% of the confidence intervals would contain the true rate, here we can say that given our observed data, there is a X% probability that the true value of $k / n$ falls within the provided interval.

Note

For a disambiguation with Confidence Interval, read https://en.wikipedia.org/wiki/Credible_interval.

Parameters

k (int) – Number of successes observed on the experiment
l (int) – Number of failures observed on the experiment
lambda (float, Optional) – The parameterisation of the Beta prior to consider. Use $λ = 1$ for a flat prior. Use $λ = 0.5$ for Jeffrey’s prior (the default).
coverage (float, Optional) – A floating-point number between 0 and 1.0 indicating the coverage you’re expecting. A value of 0.95 will ensure 95% of the area under the probability density of the posterior is covered by the returned equal-tailed interval.

Returns

mean (float) – The mean of the posterior distribution
mode (float) – The mode of the posterior distribution
lower, upper (float) – The lower and upper bounds of the credible region

bob.med.tb.utils.measure.bayesian_measures(tp, fp, tn, fn, lambda_, coverage)[source]¶

Calculates mean and mode from true/false positive and negative counts with credible regions

This function can return bayesian estimates of standard machine learning measures from true and false positive counts of positives and negatives. For a thorough look into these and alternate names for the returned values, please check Wikipedia’s entry on Precision and Recall. See beta_credible_region() for details on the calculation of returned values.

Parameters

tp (int) – True positive count, AKA “hit”
fp (int) – False positive count, AKA “false alarm”, or “Type I error”
tn (int) – True negative count, AKA “correct rejection”
fn (int) – False Negative count, AKA “miss”, or “Type II error”
lambda (float) – The parameterisation of the Beta prior to consider. Use $λ = 1$ for a flat prior. Use $λ = 0.5$ for Jeffrey’s prior.
coverage (float) – A floating-point number between 0 and 1.0 indicating the coverage you’re expecting. A value of 0.95 will ensure 95% of the area under the probability density of the posterior is covered by the returned equal-tailed interval.

Returns

precision ((float, float, float, float)) – P, AKA positive predictive value (PPV), mean, mode and credible intervals (95% CI). It corresponds arithmetically to tp/(tp+fp).
recall ((float, float, float, float)) – R, AKA sensitivity, hit rate, or true positive rate (TPR), mean, mode and credible intervals (95% CI). It corresponds arithmetically to tp/(tp+fn).
specificity ((float, float, float, float)) – S, AKA selectivity or true negative rate (TNR), mean, mode and credible intervals (95% CI). It corresponds arithmetically to tn/(tn+fp).
accuracy ((float, float, float, float)) – A, mean, mode and credible intervals (95% CI). See Accuracy. is the proportion of correct predictions (both true positives and true negatives) among the total number of pixels examined. It corresponds arithmetically to (tp+tn)/(tp+tn+fp+fn). This measure includes both true-negatives and positives in the numerator, what makes it sensitive to data or regions without annotations.
jaccard ((float, float, float, float)) – J, mean, mode and credible intervals (95% CI). See Jaccard Index or Similarity. It corresponds arithmetically to tp/(tp+fp+fn). The Jaccard index depends on a TP-only numerator, similarly to the F1 score. For regions where there are no annotations, the Jaccard index will always be zero, irrespective of the model output. Accuracy may be a better proxy if one needs to consider the true abscence of annotations in a region as part of the measure.
f1_score ((float, float, float, float)) – F1, mean, mode and credible intervals (95% CI). See F1-score. It corresponds arithmetically to 2*P*R/(P+R) or 2*tp/(2*tp+fp+fn). The F1 or Dice score depends on a TP-only numerator, similarly to the Jaccard index. For regions where there are no annotations, the F1-score will always be zero, irrespective of the model output. Accuracy may be a better proxy if one needs to consider the true abscence of annotations in a region as part of the measure.

bob.med.tb.utils.measure.get_centered_maxf1(f1_scores, thresholds)[source]¶

Return the centered max F1 score threshold when multiple threshold give the same max F1 score

Parameters

f1_scores (numpy.ndarray) – 1D array of f1 scores
thresholds (numpy.ndarray) – 1D array of thresholds

Returns

max F1 score (float)
threshold (float)