Research

Rethinking Computer-aided Tuberculosis Diagnosis

Yun Liu*1Yu-Huan Wu*1, Yunfeng Ban2, Huifang Wang2, Ming-Ming Cheng1

1Nankai University        2InferVision

Abstract

As a serious infectious disease, tuberculosis (TB) is one of the major threats to human health worldwide, leading to millions of death every year. Although early diagnosis and treatment can greatly improve the chances of survival, it remains a major challenge, especially in developing countries. Computer-aided tuberculosis diagnosis (CTD) is a promising choice for TB diagnosis due to the great successes of deep learning. However, when it comes to TB diagnosis, the lack of training data has hampered the progress of CTD. To solve this problem, we establish a large-scale TB dataset, namely Tuberculosis X-ray (TBX11K) dataset. This dataset contains 11200 X-ray images with corresponding bounding box annotations for TB areas, while the existing largest public TB dataset only has 662 X-ray images with corresponding image-level annotations. The proposed dataset enables the training of sophisticated detectors for high-quality CTD. We reform the existing object detectors to adapt them to simultaneous image classification and TB area detection. These reformed detectors are trained and evaluated on the proposed TBX11K dataset and served as the baselines for future research.

Paper

The InferVision product using this research results has been included in UN’s Global Drug Facility (GDF) list. This is the first time an AI product has been included in GDF.

Citation

It would be highly appreciated if you can cite our paper when using our dataset:

@article{liu2023revisiting,
  title={Revisiting Computer-Aided Tuberculosis Diagnosis},
  author={Liu, Yun and Wu, Yu-Huan and Zhang, Shi-Chen and Liu, Li and Wu, Min and Cheng, Ming-Ming},
  journal={IEEE Transactions on Pattern Analysis and Machine Intelligence},
  year={2023}
}

@inproceedings{liu2020rethinking,
  title={Rethinking computer-aided tuberculosis diagnosis},
  author={Liu, Yun and Wu, Yu-Huan and Ban, Yunfeng and Wang, Huifang and Cheng, Ming-Ming},
  booktitle={Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition},
  pages={2646--2655},
  year={2020}
}

Comparison with Other TB Datasets

The proposed TBX11K dataset is much larger, better annotated, and more realistic than existing TB datasets, enabling the training of deep CNNs. First, unlike previous datasets [1, 2] that only contain several tens/hundreds of X-ray images, TBX11K has 11,200 images that are about 17× larger than the existing largest dataset, i.e., Shenzhen dataset [1], so that TBX11K makes it possible to train very deep CNNs. Second, instead of only having image-level annotations as previous datasets, TBX11K annotates TB areas using bounding boxes, so that the future CTD methods can not only recognize the manifestations of TB but also detect the TB areas to help radiologists for the definitive diagnosis. Third, TBX11K includes four categories of healthy, active TB, latent TB, and unhealthy but non-TB, rather than the binary classification for TB or not in previous datasets, so that future CTD systems can adapt to more complex real-world scenarios and provide people with more detailed disease analyses.

Dataset Splits

The proposed TBX11K dataset is split into training, validation, and testing sets. “Active & Latent TB” refers to X-rays that contain active and latent TB simultaneously. “Active TB” and “Latent TB” refers to X-rays that only contain active TB or latent TB, respectively. “Uncertain TB” refers to TB X-rays whose TB types cannot be recognized under today’s medical conditions. Uncertain TB X-rays are all put into the test set. Please refer to the file “README.md” in the downloaded dataset for more details about dataset splits.

This is the distribution of the areas of TB bounding boxes. The left and right values of each bin define its corresponding area range, and the height of each bin denotes the number of TB bounding boxes with an area within this range. Note that X-rays are in the resolution of about 3000 × 3000. However, the original 3000 × 3000 images will lead to a storage size of over 100GB, which is too large to deliver. On the other hand, we found that the resolution of 512 × 512 is enough to train deep models for TB detection and classification. In addition, it is almost impossible to directly use the 3000 × 3000 X-ray images for TB detection due to the limited receptive fields of the existing CNNs. Therefore, we decide to only release the X-rays with the resolution of 512 × 512. For a fair comparison, we recommend all researchers to use this resolution for their experiments.

Online Challenge

We only release the training and validation sets of the proposed TBX11K dataset. The test set is retained as an online challenge for simultaneous TB X-ray classification and TB area detection in a single system (e.g., a convolutional neural network). To participate this challenge, you need to create an account on CodaLab and register for the TBX11K Tuberculosis Classification and Detection Challenge. Please refer to this webpage or our paper to see the evaluation metrics. Then, open the “Participate” tab to read the submission guidelines carefully. Next, you can upload your submission. Once uploaded, your submissions will be evaluated automatically. We have added four well-known baseline methods in the leaderboard, including Faster R-CNN (ResNet50) [3], FCOS (ResNet50) [4], RetinaNet (ResNet50) [5], and SSD (VGG16) [6]. Please refer to our paper for details about the reformation of these baselines.

For the evaluation of TB area detection, we adopt MS-COCO API directly. For the evaluation of X-ray classification, we use the functions in the Python package “sklearn” in a way like:

from sklearn.metrics import accuracy_score
from sklearn.metrics import recall_score
from sklearn.metrics import precision_score
from sklearn.metrics import multilabel_confusion_matrix
from sklearn.metrics import roc_auc_score

Terms of Use

This dataset belongs to the Media Computing Lab at Nankai University and is licensed under a Creative Commons Attribution 4.0 License.

References

[1] Jaeger, S., Candemir, S., Antani, S., Wáng, Y.X.J., Lu, P.X. and Thoma, G., 2014. Two public chest X-ray datasets for computer-aided screening of pulmonary diseases. Quantitative imaging in medicine and surgery, 4(6), p.475.

[2] Chauhan, A., Chauhan, D. and Rout, C., 2014. Role of Gist and PHOG features in computer-aided diagnosis of tuberculosis without segmentation. PloS one, 9(11), p.e112980.

[3] Ren, S., He, K., Girshick, R. and Sun, J., 2015. Faster R-CNN: Towards real-time object detection with region proposal networks. In Advances in neural information processing systems (pp. 91-99).

[4] Tian, Z., Shen, C., Chen, H. and He, T., 2019. FCOS: Fully convolutional one-stage object detection. In Proceedings of the IEEE international conference on computer vision (pp. 9627-9636).

[5] Lin, T.Y., Goyal, P., Girshick, R., He, K. and Dollár, P., 2017. Focal loss for dense object detection. In Proceedings of the IEEE international conference on computer vision (pp. 2980-2988).

[6] Liu, W., Anguelov, D., Erhan, D., Szegedy, C., Reed, S., Fu, C.Y. and Berg, A.C., 2016, October. SSD: Single shot multibox detector. In European conference on computer vision (pp. 21-37). Springer, Cham.

Frequently Asked Questions

1. In the paper, it says that all X-rays are in the resolution of about 3000 × 3000, but I download the dataset and the X-ray resolution is 512 × 512. Why?

As explained above, the original 3000 × 3000 images will lead to a storage size of over 100GB, which is too large to deliver. On the other hand, we found that the resolution of 512 × 512 is enough to train deep models for TB detection and classification. In addition, it is almost impossible to directly use the 3000 × 3000 X-ray images for TB detection due to the limited receptive fields of the existing CNNs. Therefore, we decide to only release the X-rays with the resolution of 512 × 512. For a fair comparison, we recommend all researchers to use this resolution for their experiments.

2. What is the format of the bounding box annotations?

For the xml format annotations, we provide [xmin, ymin, xmax, ymax]; while the json format is the same as COCO, i.e., [x, y, width, height]. This can be seen from our code: code/make_json_anno.py

3. I could see that you have used the category_id = 3 for “PulmonaryTuberculosis”. However, I could see no images categorized with this ID. There are only category IDs 1 and 2. Could you please explain this?

The category of “PulmonaryTuberculosis”, i.e., category_id = 3, indicates the unknown TB X-rays in our paper. Note that unknown TB X-rays are all in the test set whose annotations are not released and reserved as an online challenge. We only use unknown TB X-rays for the evaluation of category-agnostic TB area detection. Hence, when you build your model, please only use categories with category_id = 1 and category_id = 2.

(Visited 27,645 times, 10 visits today)
Subscribe
Notify of
guest

31 Comments
Inline Feedbacks
View all comments
dskim

hi
I want to participate in the 7916, but an error message occurs after submitting the results to the codalab evaluation server.

the error message is ModuleNotFoundError: No module named ‘pycocotools’

please add pycotools inside the competition docker image.

Yun Liu

Thank you for your comment. The problem has been fixed, and the competition is ready now.

Amir Rajak

Hi again. Could you provide the raw dicom files of this dataset ? Our model which performs very well on some unseen private dataset, has very low specificity on your dataset. And if we had the raw dicom files we’d be able to replicate our exact preprocessing steps.

Amir Rajak

first of all thanks for this amazing work. I have a confusion regarding the annotation. In the JSON file I see you have three categories of TB:

  1. ActiveTuberculosis
  2. ObsoletePulmonaryTuberculosis
  3. PulmonaryTuberculosis

But on the paper you have reported as Active and Latent TBs. So could you please clarify which of the three TBs do you mean by Latent TB ?

Amir Rajak

Thanks for a quick response. 🙂

mfarnas

Thank you for this great work. I understand that the original DICOM images cannot be delivered due to the limited storage space. However, there are still some valuable information that can be obtained from the DICOM metadata (e.g. gender, age, position, etc). Can these information be shared in a file(s)?

MM Cheng

Thanks for your valuable suggestion. We will prepare for it and let you know when it is released.

kj172

请问您的这个res2net和resnet34在分割的性能上比较了吗?哪个更好一点,

Kafka

Y’all are doing incredible work. Thank you! I’d banged my head against a wall trying to contact hospitals in the middle of a pandemic for coming up with agreements. This is seriously awesome

Kafka

Future suggestion: It’d be very cool if y’all could expand a bit more on the potential research section. Are there any training methods etc that the team would like other teams to pick up on? We’d be very interested to know!

Hyunsuk Yoo

Thank you everyone for this great work, and also sharing the dataset publicly. As a doctor, I have some questions regarding how the GT labels were created.

Hyunsuk Yoo

(1) Are latent TB cases also biologically confirmed (for example by tuberculin testing or IFNg testing?)

Hyunsuk Yoo

Here are my questions:

(1) Are latent TB cases biologically confirmed? (by IFNg testing or tuberculin skin testing)
(2) If the cases are biologically positive for active TB, but does has CXRs regions suspicious for latent TB only, are they labeled as latent TB or active TB? 
(3) If the cases are biologically positive for active TB, but does not contain CXRs regions that are not suspicious for active TB, how are they labeled?

Dr. Rajaraman

That was a great work. I need clarity about the bounding box annotations you have released. For instance, for a given image, tb/tb0003.png, you have given the following annotations: [259.68731689453125 44.277679443359375 101.13803100585938 138.91192626953125]. What is the order of this bounding box? is it [xmin, xmax, ymin, max] or [x,y,width,height]? i cant find these details in the JSON or the pdf as well. Kindly clarity ASAP.

Dr. Rajaraman

Thanks a bunch for your response. I have one more question. I could see that you have used the category_id = 3 for ‘PulmonaryTuberculosis’. However, i could see no images categorized with this id. There are only category ids 1 and 2. Could you please explain this?

Last edited 4 years ago by Dr. Rajaraman
jingjing.yin

Hello,In the paper ,it says that all r-rays are in the resolution of about 3000*3000,but I download the dataset from baiduyun and the picture’s resolution is about 512*512.So,how can I get the dataset in the resolution of about 3000*3000?

Yun Liu

The original 3000 * 3000 images will lead to a storage size of over 100GB, which is too large to deliver. On the other hand, we found that the resolution of 512 * 512 is enough to train deep models for TB detection and classification. In addition, it is almost impossible to directly use the 3000 * 3000 X-ray images for TB detection due to the limited receptive fields of the existing CNNs. Therefore, we decide to only release the X-rays with the resolution of 512 * 512. For a fair comparison (an evaluation server will be provided these days), we recommend all researchers to use this resolution for their experiments.

jingjing.yin

Thanks for explaining it to me,now I understand it.Thank you again.

Yun Liu

Hey, the online challenge for the test set has been started.

Davis Jin

Can you provide the url of the dataset TBX11K?

hoangnguyen

I am also interested in your dataset, when you can public it?

Yun Liu

Hey, we have released the data, and we will keep updating this page and provide an evaluation server for the test set.

Yun Liu

Hey, we have released the data, and we will keep updating this page and provide an evaluation server for the test set.