Rethinking Computer-aided Tuberculosis Diagnosis

Yun Liu*1, Yu-Huan Wu*1, Yunfeng Ban2, Huifang Wang2, Ming-Ming Cheng1
1Nankai University 2InferVision
Abstract
As a serious infectious disease, tuberculosis (TB) is one of the major threats to human health worldwide, leading to millions of death every year. Although early diagnosis and treatment can greatly improve the chances of survival, it remains a major challenge, especially in developing countries. Computer-aided tuberculosis diagnosis (CTD) is a promising choice for TB diagnosis due to the great successes of deep learning. However, when it comes to TB diagnosis, the lack of training data has hampered the progress of CTD. To solve this problem, we establish a large-scale TB dataset, namely Tuberculosis X-ray (TBX11K) dataset. This dataset contains 11200 X-ray images with corresponding bounding box annotations for TB areas, while the existing largest public TB dataset only has 662 X-ray images with corresponding image-level annotations. The proposed dataset enables the training of sophisticated detectors for high-quality CTD. We reform the existing object detectors to adapt them to simultaneous image classification and TB area detection. These reformed detectors are trained and evaluated on the proposed TBX11K dataset and served as the baselines for future research.
Paper
- Rethinking Computer-Aided Tuberculosis Diagnosis, Yun Liu*, Yu-Huan Wu*, Yunfeng Ban, Huifang Wang, and Ming-Ming Cheng, IEEE CVPR (oral), 2020. [PDF] [bib] [Dataset on Google Drive] [Dataset on Baidu Yunpan] [Online Challenge] [Video] [PPT]
- Revisiting Computer-Aided Tuberculosis Diagnosis, Yun Liu, Yu-Huan Wu, Shi-Chen Zhang, Li Liu, Min Wu, and Ming-Ming Cheng, IEEE TPAMI, 2023. [PDF] [Code]
The InferVision product using this research results has been included in UN’s Global Drug Facility (GDF) list. This is the first time an AI product has been included in GDF.
Citation
It would be highly appreciated if you can cite our paper when using our dataset:
@article{liu2023revisiting, title={Revisiting Computer-Aided Tuberculosis Diagnosis}, author={Liu, Yun and Wu, Yu-Huan and Zhang, Shi-Chen and Liu, Li and Wu, Min and Cheng, Ming-Ming}, journal={IEEE Transactions on Pattern Analysis and Machine Intelligence}, year={2023} } @inproceedings{liu2020rethinking, title={Rethinking computer-aided tuberculosis diagnosis}, author={Liu, Yun and Wu, Yu-Huan and Ban, Yunfeng and Wang, Huifang and Cheng, Ming-Ming}, booktitle={Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition}, pages={2646--2655}, year={2020} }
Comparison with Other TB Datasets

The proposed TBX11K dataset is much larger, better annotated, and more realistic than existing TB datasets, enabling the training of deep CNNs. First, unlike previous datasets [1, 2] that only contain several tens/hundreds of X-ray images, TBX11K has 11,200 images that are about 17× larger than the existing largest dataset, i.e., Shenzhen dataset [1], so that TBX11K makes it possible to train very deep CNNs. Second, instead of only having image-level annotations as previous datasets, TBX11K annotates TB areas using bounding boxes, so that the future CTD methods can not only recognize the manifestations of TB but also detect the TB areas to help radiologists for the definitive diagnosis. Third, TBX11K includes four categories of healthy, active TB, latent TB, and unhealthy but non-TB, rather than the binary classification for TB or not in previous datasets, so that future CTD systems can adapt to more complex real-world scenarios and provide people with more detailed disease analyses.
Dataset Splits

The proposed TBX11K dataset is split into training, validation, and testing sets. “Active & Latent TB” refers to X-rays that contain active and latent TB simultaneously. “Active TB” and “Latent TB” refers to X-rays that only contain active TB or latent TB, respectively. “Uncertain TB” refers to TB X-rays whose TB types cannot be recognized under today’s medical conditions. Uncertain TB X-rays are all put into the test set. Please refer to the file “README.md” in the downloaded dataset for more details about dataset splits.

This is the distribution of the areas of TB bounding boxes. The left and right values of each bin define its corresponding area range, and the height of each bin denotes the number of TB bounding boxes with an area within this range. Note that X-rays are in the resolution of about 3000 × 3000. However, the original 3000 × 3000 images will lead to a storage size of over 100GB, which is too large to deliver. On the other hand, we found that the resolution of 512 × 512 is enough to train deep models for TB detection and classification. In addition, it is almost impossible to directly use the 3000 × 3000 X-ray images for TB detection due to the limited receptive fields of the existing CNNs. Therefore, we decide to only release the X-rays with the resolution of 512 × 512. For a fair comparison, we recommend all researchers to use this resolution for their experiments.
Online Challenge
We only release the training and validation sets of the proposed TBX11K dataset. The test set is retained as an online challenge for simultaneous TB X-ray classification and TB area detection in a single system (e.g., a convolutional neural network). To participate this challenge, you need to create an account on CodaLab and register for the TBX11K Tuberculosis Classification and Detection Challenge. Please refer to this webpage or our paper to see the evaluation metrics. Then, open the “Participate” tab to read the submission guidelines carefully. Next, you can upload your submission. Once uploaded, your submissions will be evaluated automatically. We have added four well-known baseline methods in the leaderboard, including Faster R-CNN (ResNet50) [3], FCOS (ResNet50) [4], RetinaNet (ResNet50) [5], and SSD (VGG16) [6]. Please refer to our paper for details about the reformation of these baselines.
For the evaluation of TB area detection, we adopt MS-COCO API directly. For the evaluation of X-ray classification, we use the functions in the Python package “sklearn” in a way like:
from sklearn.metrics import accuracy_score
from sklearn.metrics import recall_score
from sklearn.metrics import precision_score
from sklearn.metrics import multilabel_confusion_matrix
from sklearn.metrics import roc_auc_score
Terms of Use
This dataset belongs to the Media Computing Lab at Nankai University and is licensed under a Creative Commons Attribution 4.0 License.
References
[1] Jaeger, S., Candemir, S., Antani, S., Wáng, Y.X.J., Lu, P.X. and Thoma, G., 2014. Two public chest X-ray datasets for computer-aided screening of pulmonary diseases. Quantitative imaging in medicine and surgery, 4(6), p.475.
[2] Chauhan, A., Chauhan, D. and Rout, C., 2014. Role of Gist and PHOG features in computer-aided diagnosis of tuberculosis without segmentation. PloS one, 9(11), p.e112980.
[3] Ren, S., He, K., Girshick, R. and Sun, J., 2015. Faster R-CNN: Towards real-time object detection with region proposal networks. In Advances in neural information processing systems (pp. 91-99).
[4] Tian, Z., Shen, C., Chen, H. and He, T., 2019. FCOS: Fully convolutional one-stage object detection. In Proceedings of the IEEE international conference on computer vision (pp. 9627-9636).
[5] Lin, T.Y., Goyal, P., Girshick, R., He, K. and Dollár, P., 2017. Focal loss for dense object detection. In Proceedings of the IEEE international conference on computer vision (pp. 2980-2988).
[6] Liu, W., Anguelov, D., Erhan, D., Szegedy, C., Reed, S., Fu, C.Y. and Berg, A.C., 2016, October. SSD: Single shot multibox detector. In European conference on computer vision (pp. 21-37). Springer, Cham.
Frequently Asked Questions
1. In the paper, it says that all X-rays are in the resolution of about 3000 × 3000, but I download the dataset and the X-ray resolution is 512 × 512. Why?
As explained above, the original 3000 × 3000 images will lead to a storage size of over 100GB, which is too large to deliver. On the other hand, we found that the resolution of 512 × 512 is enough to train deep models for TB detection and classification. In addition, it is almost impossible to directly use the 3000 × 3000 X-ray images for TB detection due to the limited receptive fields of the existing CNNs. Therefore, we decide to only release the X-rays with the resolution of 512 × 512. For a fair comparison, we recommend all researchers to use this resolution for their experiments.
2. What is the format of the bounding box annotations?
For the xml format annotations, we provide [xmin, ymin, xmax, ymax]; while the json format is the same as COCO, i.e., [x, y, width, height]. This can be seen from our code: code/make_json_anno.py
3. I could see that you have used the category_id = 3 for “PulmonaryTuberculosis”. However, I could see no images categorized with this ID. There are only category IDs 1 and 2. Could you please explain this?
The category of “PulmonaryTuberculosis”, i.e., category_id = 3, indicates the unknown TB X-rays in our paper. Note that unknown TB X-rays are all in the test set whose annotations are not released and reserved as an online challenge. We only use unknown TB X-rays for the evaluation of category-agnostic TB area detection. Hence, when you build your model, please only use categories with category_id = 1 and category_id = 2.