Supported Datasets¶
Here is a list of currently supported datasets in this package, alongside notable properties. Each dataset name is linked to the location where raw data can be downloaded. The list of images in each split is available in the source code.
Tuberculosis datasets¶
The following datasets contain only the tuberculosis final diagnosis (0 or 1). In addition to the splits presented in the following table, 10 folds (for cross-validation) randomly generated are available for these datasets.
Dataset |
Reference |
H x W |
Samples |
Training |
Validation |
Test |
4020 x 4892 |
138 |
88 |
22 |
28 |
||
Varying |
662 |
422 |
107 |
133 |
||
Varying |
155 |
83 |
20 |
52 |
Tuberculosis + radiological findings dataset¶
The following dataset contains both the tuberculosis final diagnosis (0 or 1) and radiological findings.
Dataset |
Reference |
H x W |
Samples |
Train |
Test |
Varying |
160’861 |
160’861 |
0 |
Radiological findings datasets¶
The following dataset contains only the radiological findings without any information about tuberculosis.
Note
NIH CXR14 labels for training and validation sets are the relabeled versions done by the author of the CheXNeXt study [CHEXNEXT-2018].
Dataset |
Reference |
H x W |
Samples |
Training |
Validation |
Test |
1024 x 1024 |
109’041 |
98’637 |
6’350 |
4’054 |
HIV-Tuberculosis datasets¶
The following datasets contain only the tuberculosis final diagnosis (0 or 1) and come from HIV infected patients. 10 folds (for cross-validation) randomly generated are available for these datasets.
Please contact the authors of these datasets to have access to the data.
Dataset |
Reference |
H x W |
Samples |
TB POC |
2048 x 2500 |
407 |
|
HIV TB |
2048 x 2500 |
243 |