bob.med.tb.utils.measure¶
Functions
|
Calculates measures from true/false positive and negative counts |
|
Calculates mean and mode from true/false positive and negative counts with credible regions |
|
Returns the mode, upper and lower bounds of the equal-tailed credible region of a probability estimate following Bernoulli trials. |
|
Return the centered max F1 score threshold when multiple threshold give the same max F1 score |
|
Divides n by d. |
Classes
|
Track a series of values and provide access to smoothed values over a window or the global series average. |
- class bob.med.tb.utils.measure.SmoothedValue(window_size=20)[source]¶
Bases:
object
Track a series of values and provide access to smoothed values over a window or the global series average.
- property median¶
- property avg¶
- bob.med.tb.utils.measure.tricky_division(n, d)[source]¶
Divides n by d. Returns 0.0 in case of a division by zero
- bob.med.tb.utils.measure.base_measures(tp, fp, tn, fn)[source]¶
Calculates measures from true/false positive and negative counts
This function can return standard machine learning measures from true and false positive counts of positives and negatives. For a thorough look into these and alternate names for the returned values, please check Wikipedia’s entry on Precision and Recall.
- Parameters
- Returns
precision (float) – P, AKA positive predictive value (PPV). It corresponds arithmetically to
tp/(tp+fp)
. In the casetp+fp == 0
, this function returns zero for precision.recall (float) – R, AKA sensitivity, hit rate, or true positive rate (TPR). It corresponds arithmetically to
tp/(tp+fn)
. In the special case wheretp+fn == 0
, this function returns zero for recall.specificity (float) – S, AKA selectivity or true negative rate (TNR). It corresponds arithmetically to
tn/(tn+fp)
. In the special case wheretn+fp == 0
, this function returns zero for specificity.accuracy (float) – A, see Accuracy. is the proportion of correct predictions (both true positives and true negatives) among the total number of pixels examined. It corresponds arithmetically to
(tp+tn)/(tp+tn+fp+fn)
. This measure includes both true-negatives and positives in the numerator, what makes it sensitive to data or regions without annotations.jaccard (float) – J, see Jaccard Index or Similarity. It corresponds arithmetically to
tp/(tp+fp+fn)
. In the special case wheretn+fp+fn == 0
, this function returns zero for the Jaccard index. The Jaccard index depends on a TP-only numerator, similarly to the F1 score. For regions where there are no annotations, the Jaccard index will always be zero, irrespective of the model output. Accuracy may be a better proxy if one needs to consider the true abscence of annotations in a region as part of the measure.f1_score (float) – F1, see F1-score. It corresponds arithmetically to
2*P*R/(P+R)
or2*tp/(2*tp+fp+fn)
. In the special case whereP+R == (2*tp+fp+fn) == 0
, this function returns zero for the Jaccard index. The F1 or Dice score depends on a TP-only numerator, similarly to the Jaccard index. For regions where there are no annotations, the F1-score will always be zero, irrespective of the model output. Accuracy may be a better proxy if one needs to consider the true abscence of annotations in a region as part of the measure.
- bob.med.tb.utils.measure.beta_credible_region(k, l, lambda_, coverage)[source]¶
Returns the mode, upper and lower bounds of the equal-tailed credible region of a probability estimate following Bernoulli trials.
This implemetnation is based on [GOUTTE-2005]. It assumes
successes and failures ( total trials) are issued from a series of Bernoulli trials (likelihood is binomial). The posterior is derivated using the Bayes Theorem with a beta prior. As there is no reason to favour high vs. low precision, we use a symmetric Beta prior ( ):The mode for this posterior (also the maximum a posteriori) is:
Concretely, the prior may be flat (all rates are equally likely,
) or we may use Jeoffrey’s prior ( ), that is invariant through re-parameterisation. Jeffrey’s prior indicate that rates close to zero or one are more likely.The mode above works if
, which is usually the case for a resonably well tunned system, with more than a few samples for analysis. In the limit of the system performance, may be 0, which will make the mode become zero.For our purposes, it may be more suitable to represent
, with , the number of successes and , the number of failures in the binomial experiment, and find this more suitable representation:This can be mapped to most rates calculated in the context of binary classification this way:
Precision or Positive-Predictive Value (PPV): p = TP/(TP+FP), so k=TP, l=FP
Recall, Sensitivity, or True Positive Rate: r = TP/(TP+FN), so k=TP, l=FN
Specificity or True Negative Rage: s = TN/(TN+FP), so k=TN, l=FP
F1-score: f1 = 2TP/(2TP+FP+FN), so k=2TP, l=FP+FN
Accuracy: acc = TP+TN/(TP+TN+FP+FN), so k=TP+TN, l=FP+FN
Jaccard: j = TP/(TP+FP+FN), so k=TP, l=FP+FN
Contrary to frequentist approaches, in which one can only say that if the test were repeated an infinite number of times, and one constructed a confidence interval each time, then X% of the confidence intervals would contain the true rate, here we can say that given our observed data, there is a X% probability that the true value of
falls within the provided interval.Note
For a disambiguation with Confidence Interval, read https://en.wikipedia.org/wiki/Credible_interval.
- Parameters
k (int) – Number of successes observed on the experiment
l (int) – Number of failures observed on the experiment
lambda (
float
, Optional) – The parameterisation of the Beta prior to consider. Use for a flat prior. Use for Jeffrey’s prior (the default).coverage (
float
, Optional) – A floating-point number between 0 and 1.0 indicating the coverage you’re expecting. A value of 0.95 will ensure 95% of the area under the probability density of the posterior is covered by the returned equal-tailed interval.
- Returns
mean (float) – The mean of the posterior distribution
mode (float) – The mode of the posterior distribution
lower, upper (float) – The lower and upper bounds of the credible region
- bob.med.tb.utils.measure.bayesian_measures(tp, fp, tn, fn, lambda_, coverage)[source]¶
Calculates mean and mode from true/false positive and negative counts with credible regions
This function can return bayesian estimates of standard machine learning measures from true and false positive counts of positives and negatives. For a thorough look into these and alternate names for the returned values, please check Wikipedia’s entry on Precision and Recall. See
beta_credible_region()
for details on the calculation of returned values.- Parameters
tp (int) – True positive count, AKA “hit”
fp (int) – False positive count, AKA “false alarm”, or “Type I error”
tn (int) – True negative count, AKA “correct rejection”
fn (int) – False Negative count, AKA “miss”, or “Type II error”
lambda (float) – The parameterisation of the Beta prior to consider. Use
for a flat prior. Use for Jeffrey’s prior.coverage (float) – A floating-point number between 0 and 1.0 indicating the coverage you’re expecting. A value of 0.95 will ensure 95% of the area under the probability density of the posterior is covered by the returned equal-tailed interval.
- Returns
precision ((float, float, float, float)) – P, AKA positive predictive value (PPV), mean, mode and credible intervals (95% CI). It corresponds arithmetically to
tp/(tp+fp)
.recall ((float, float, float, float)) – R, AKA sensitivity, hit rate, or true positive rate (TPR), mean, mode and credible intervals (95% CI). It corresponds arithmetically to
tp/(tp+fn)
.specificity ((float, float, float, float)) – S, AKA selectivity or true negative rate (TNR), mean, mode and credible intervals (95% CI). It corresponds arithmetically to
tn/(tn+fp)
.accuracy ((float, float, float, float)) – A, mean, mode and credible intervals (95% CI). See Accuracy. is the proportion of correct predictions (both true positives and true negatives) among the total number of pixels examined. It corresponds arithmetically to
(tp+tn)/(tp+tn+fp+fn)
. This measure includes both true-negatives and positives in the numerator, what makes it sensitive to data or regions without annotations.jaccard ((float, float, float, float)) – J, mean, mode and credible intervals (95% CI). See Jaccard Index or Similarity. It corresponds arithmetically to
tp/(tp+fp+fn)
. The Jaccard index depends on a TP-only numerator, similarly to the F1 score. For regions where there are no annotations, the Jaccard index will always be zero, irrespective of the model output. Accuracy may be a better proxy if one needs to consider the true abscence of annotations in a region as part of the measure.f1_score ((float, float, float, float)) – F1, mean, mode and credible intervals (95% CI). See F1-score. It corresponds arithmetically to
2*P*R/(P+R)
or2*tp/(2*tp+fp+fn)
. The F1 or Dice score depends on a TP-only numerator, similarly to the Jaccard index. For regions where there are no annotations, the F1-score will always be zero, irrespective of the model output. Accuracy may be a better proxy if one needs to consider the true abscence of annotations in a region as part of the measure.
- bob.med.tb.utils.measure.get_centered_maxf1(f1_scores, thresholds)[source]¶
Return the centered max F1 score threshold when multiple threshold give the same max F1 score
- Parameters
f1_scores (numpy.ndarray) – 1D array of f1 scores
thresholds (numpy.ndarray) – 1D array of thresholds
- Returns
max F1 score (float)
threshold (float)