bob.med.tb.utils.measure

Functions

base_measures(tp, fp, tn, fn)

Calculates measures from true/false positive and negative counts

bayesian_measures(tp, fp, tn, fn, lambda_, ...)

Calculates mean and mode from true/false positive and negative counts with credible regions

beta_credible_region(k, l, lambda_, coverage)

Returns the mode, upper and lower bounds of the equal-tailed credible region of a probability estimate following Bernoulli trials.

get_centered_maxf1(f1_scores, thresholds)

Return the centered max F1 score threshold when multiple threshold give the same max F1 score

tricky_division(n, d)

Divides n by d.

Classes

SmoothedValue([window_size])

Track a series of values and provide access to smoothed values over a window or the global series average.

class bob.med.tb.utils.measure.SmoothedValue(window_size=20)[source]

Bases: object

Track a series of values and provide access to smoothed values over a window or the global series average.

update(value)[source]
property median
property avg
bob.med.tb.utils.measure.tricky_division(n, d)[source]

Divides n by d. Returns 0.0 in case of a division by zero

bob.med.tb.utils.measure.base_measures(tp, fp, tn, fn)[source]

Calculates measures from true/false positive and negative counts

This function can return standard machine learning measures from true and false positive counts of positives and negatives. For a thorough look into these and alternate names for the returned values, please check Wikipedia’s entry on Precision and Recall.

Parameters
  • tp (int) – True positive count, AKA “hit”

  • fp (int) – False positive count, AKA, “correct rejection”

  • tn (int) – True negative count, AKA “false alarm”, or “Type I error”

  • fn (int) – False Negative count, AKA “miss”, or “Type II error”

Returns

  • precision (float) – P, AKA positive predictive value (PPV). It corresponds arithmetically to tp/(tp+fp). In the case tp+fp == 0, this function returns zero for precision.

  • recall (float) – R, AKA sensitivity, hit rate, or true positive rate (TPR). It corresponds arithmetically to tp/(tp+fn). In the special case where tp+fn == 0, this function returns zero for recall.

  • specificity (float) – S, AKA selectivity or true negative rate (TNR). It corresponds arithmetically to tn/(tn+fp). In the special case where tn+fp == 0, this function returns zero for specificity.

  • accuracy (float) – A, see Accuracy. is the proportion of correct predictions (both true positives and true negatives) among the total number of pixels examined. It corresponds arithmetically to (tp+tn)/(tp+tn+fp+fn). This measure includes both true-negatives and positives in the numerator, what makes it sensitive to data or regions without annotations.

  • jaccard (float) – J, see Jaccard Index or Similarity. It corresponds arithmetically to tp/(tp+fp+fn). In the special case where tn+fp+fn == 0, this function returns zero for the Jaccard index. The Jaccard index depends on a TP-only numerator, similarly to the F1 score. For regions where there are no annotations, the Jaccard index will always be zero, irrespective of the model output. Accuracy may be a better proxy if one needs to consider the true abscence of annotations in a region as part of the measure.

  • f1_score (float) – F1, see F1-score. It corresponds arithmetically to 2*P*R/(P+R) or 2*tp/(2*tp+fp+fn). In the special case where P+R == (2*tp+fp+fn) == 0, this function returns zero for the Jaccard index. The F1 or Dice score depends on a TP-only numerator, similarly to the Jaccard index. For regions where there are no annotations, the F1-score will always be zero, irrespective of the model output. Accuracy may be a better proxy if one needs to consider the true abscence of annotations in a region as part of the measure.

bob.med.tb.utils.measure.beta_credible_region(k, l, lambda_, coverage)[source]

Returns the mode, upper and lower bounds of the equal-tailed credible region of a probability estimate following Bernoulli trials.

This implemetnation is based on [GOUTTE-2005]. It assumes k successes and l failures (n=k+l total trials) are issued from a series of Bernoulli trials (likelihood is binomial). The posterior is derivated using the Bayes Theorem with a beta prior. As there is no reason to favour high vs. low precision, we use a symmetric Beta prior (α=β):

Math input error

The mode for this posterior (also the maximum a posteriori) is:

mode(p)=k+λ1n+2λ2

Concretely, the prior may be flat (all rates are equally likely, λ=1) or we may use Jeoffrey’s prior (λ=0.5), that is invariant through re-parameterisation. Jeffrey’s prior indicate that rates close to zero or one are more likely.

The mode above works if k+α,nk+α>1, which is usually the case for a resonably well tunned system, with more than a few samples for analysis. In the limit of the system performance, k may be 0, which will make the mode become zero.

For our purposes, it may be more suitable to represent n=k+l, with k, the number of successes and l, the number of failures in the binomial experiment, and find this more suitable representation:

P(p|k,l)=1B(k+α,l+α)pk+α1(1p)l+α1mode(p)=k+λ1k+l+2λ2

This can be mapped to most rates calculated in the context of binary classification this way:

  • Precision or Positive-Predictive Value (PPV): p = TP/(TP+FP), so k=TP, l=FP

  • Recall, Sensitivity, or True Positive Rate: r = TP/(TP+FN), so k=TP, l=FN

  • Specificity or True Negative Rage: s = TN/(TN+FP), so k=TN, l=FP

  • F1-score: f1 = 2TP/(2TP+FP+FN), so k=2TP, l=FP+FN

  • Accuracy: acc = TP+TN/(TP+TN+FP+FN), so k=TP+TN, l=FP+FN

  • Jaccard: j = TP/(TP+FP+FN), so k=TP, l=FP+FN

Contrary to frequentist approaches, in which one can only say that if the test were repeated an infinite number of times, and one constructed a confidence interval each time, then X% of the confidence intervals would contain the true rate, here we can say that given our observed data, there is a X% probability that the true value of k/n falls within the provided interval.

Note

For a disambiguation with Confidence Interval, read https://en.wikipedia.org/wiki/Credible_interval.

Parameters
  • k (int) – Number of successes observed on the experiment

  • l (int) – Number of failures observed on the experiment

  • lambda (float, Optional) – The parameterisation of the Beta prior to consider. Use λ=1 for a flat prior. Use λ=0.5 for Jeffrey’s prior (the default).

  • coverage (float, Optional) – A floating-point number between 0 and 1.0 indicating the coverage you’re expecting. A value of 0.95 will ensure 95% of the area under the probability density of the posterior is covered by the returned equal-tailed interval.

Returns

  • mean (float) – The mean of the posterior distribution

  • mode (float) – The mode of the posterior distribution

  • lower, upper (float) – The lower and upper bounds of the credible region

bob.med.tb.utils.measure.bayesian_measures(tp, fp, tn, fn, lambda_, coverage)[source]

Calculates mean and mode from true/false positive and negative counts with credible regions

This function can return bayesian estimates of standard machine learning measures from true and false positive counts of positives and negatives. For a thorough look into these and alternate names for the returned values, please check Wikipedia’s entry on Precision and Recall. See beta_credible_region() for details on the calculation of returned values.

Parameters
  • tp (int) – True positive count, AKA “hit”

  • fp (int) – False positive count, AKA “false alarm”, or “Type I error”

  • tn (int) – True negative count, AKA “correct rejection”

  • fn (int) – False Negative count, AKA “miss”, or “Type II error”

  • lambda (float) – The parameterisation of the Beta prior to consider. Use λ=1 for a flat prior. Use λ=0.5 for Jeffrey’s prior.

  • coverage (float) – A floating-point number between 0 and 1.0 indicating the coverage you’re expecting. A value of 0.95 will ensure 95% of the area under the probability density of the posterior is covered by the returned equal-tailed interval.

Returns

  • precision ((float, float, float, float)) – P, AKA positive predictive value (PPV), mean, mode and credible intervals (95% CI). It corresponds arithmetically to tp/(tp+fp).

  • recall ((float, float, float, float)) – R, AKA sensitivity, hit rate, or true positive rate (TPR), mean, mode and credible intervals (95% CI). It corresponds arithmetically to tp/(tp+fn).

  • specificity ((float, float, float, float)) – S, AKA selectivity or true negative rate (TNR), mean, mode and credible intervals (95% CI). It corresponds arithmetically to tn/(tn+fp).

  • accuracy ((float, float, float, float)) – A, mean, mode and credible intervals (95% CI). See Accuracy. is the proportion of correct predictions (both true positives and true negatives) among the total number of pixels examined. It corresponds arithmetically to (tp+tn)/(tp+tn+fp+fn). This measure includes both true-negatives and positives in the numerator, what makes it sensitive to data or regions without annotations.

  • jaccard ((float, float, float, float)) – J, mean, mode and credible intervals (95% CI). See Jaccard Index or Similarity. It corresponds arithmetically to tp/(tp+fp+fn). The Jaccard index depends on a TP-only numerator, similarly to the F1 score. For regions where there are no annotations, the Jaccard index will always be zero, irrespective of the model output. Accuracy may be a better proxy if one needs to consider the true abscence of annotations in a region as part of the measure.

  • f1_score ((float, float, float, float)) – F1, mean, mode and credible intervals (95% CI). See F1-score. It corresponds arithmetically to 2*P*R/(P+R) or 2*tp/(2*tp+fp+fn). The F1 or Dice score depends on a TP-only numerator, similarly to the Jaccard index. For regions where there are no annotations, the F1-score will always be zero, irrespective of the model output. Accuracy may be a better proxy if one needs to consider the true abscence of annotations in a region as part of the measure.

bob.med.tb.utils.measure.get_centered_maxf1(f1_scores, thresholds)[source]

Return the centered max F1 score threshold when multiple threshold give the same max F1 score

Parameters
Returns

  • max F1 score (float)

  • threshold (float)