Rethinking Computer-aided Tuberculosis Diagnosis

30/11/201903/01/2024 Yun Liu

Yun Liu^*1, Yu-Huan Wu^*1, Yunfeng Ban², Huifang Wang², Ming-Ming Cheng¹

¹Nankai University ²InferVision

Abstract

As a serious infectious disease, tuberculosis (TB) is one of the major threats to human health worldwide, leading to millions of death every year. Although early diagnosis and treatment can greatly improve the chances of survival, it remains a major challenge, especially in developing countries. Computer-aided tuberculosis diagnosis (CTD) is a promising choice for TB diagnosis due to the great successes of deep learning. However, when it comes to TB diagnosis, the lack of training data has hampered the progress of CTD. To solve this problem, we establish a large-scale TB dataset, namely Tuberculosis X-ray (TBX11K) dataset. This dataset contains 11200 X-ray images with corresponding bounding box annotations for TB areas, while the existing largest public TB dataset only has 662 X-ray images with corresponding image-level annotations. The proposed dataset enables the training of sophisticated detectors for high-quality CTD. We reform the existing object detectors to adapt them to simultaneous image classification and TB area detection. These reformed detectors are trained and evaluated on the proposed TBX11K dataset and served as the baselines for future research.

Paper

Rethinking Computer-Aided Tuberculosis Diagnosis, Yun Liu*, Yu-Huan Wu*, Yunfeng Ban, Huifang Wang, and Ming-Ming Cheng, IEEE CVPR (oral), 2020. [PDF] [bib] [Dataset on Google Drive] [Dataset on Baidu Yunpan] [Online Challenge] [Video] [PPT]
Revisiting Computer-Aided Tuberculosis Diagnosis, Yun Liu, Yu-Huan Wu, Shi-Chen Zhang, Li Liu, Min Wu, and Ming-Ming Cheng, IEEE TPAMI, 2023. [PDF] [Code]

The InferVision product using this research results has been included in UN’s Global Drug Facility (GDF) list. This is the first time an AI product has been included in GDF.

Citation

It would be highly appreciated if you can cite our paper when using our dataset:

@article{liu2023revisiting,
  title={Revisiting Computer-Aided Tuberculosis Diagnosis},
  author={Liu, Yun and Wu, Yu-Huan and Zhang, Shi-Chen and Liu, Li and Wu, Min and Cheng, Ming-Ming},
  journal={IEEE Transactions on Pattern Analysis and Machine Intelligence},
  year={2023}
}

@inproceedings{liu2020rethinking,
  title={Rethinking computer-aided tuberculosis diagnosis},
  author={Liu, Yun and Wu, Yu-Huan and Ban, Yunfeng and Wang, Huifang and Cheng, Ming-Ming},
  booktitle={Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition},
  pages={2646--2655},
  year={2020}
}

Comparison with Other TB Datasets

The proposed TBX11K dataset is much larger, better annotated, and more realistic than existing TB datasets, enabling the training of deep CNNs. First, unlike previous datasets [1, 2] that only contain several tens/hundreds of X-ray images, TBX11K has 11,200 images that are about 17× larger than the existing largest dataset, i.e., Shenzhen dataset [1], so that TBX11K makes it possible to train very deep CNNs. Second, instead of only having image-level annotations as previous datasets, TBX11K annotates TB areas using bounding boxes, so that the future CTD methods can not only recognize the manifestations of TB but also detect the TB areas to help radiologists for the definitive diagnosis. Third, TBX11K includes four categories of healthy, active TB, latent TB, and unhealthy but non-TB, rather than the binary classification for TB or not in previous datasets, so that future CTD systems can adapt to more complex real-world scenarios and provide people with more detailed disease analyses.

Dataset Splits

The proposed TBX11K dataset is split into training, validation, and testing sets. “Active & Latent TB” refers to X-rays that contain active and latent TB simultaneously. “Active TB” and “Latent TB” refers to X-rays that only contain active TB or latent TB, respectively. “Uncertain TB” refers to TB X-rays whose TB types cannot be recognized under today’s medical conditions. Uncertain TB X-rays are all put into the test set. Please refer to the file “README.md” in the downloaded dataset for more details about dataset splits.

This is the distribution of the areas of TB bounding boxes. The left and right values of each bin define its corresponding area range, and the height of each bin denotes the number of TB bounding boxes with an area within this range. Note that X-rays are in the resolution of about 3000 × 3000. However, the original 3000 × 3000 images will lead to a storage size of over 100GB, which is too large to deliver. On the other hand, we found that the resolution of 512 × 512 is enough to train deep models for TB detection and classification. In addition, it is almost impossible to directly use the 3000 × 3000 X-ray images for TB detection due to the limited receptive fields of the existing CNNs. Therefore, we decide to only release the X-rays with the resolution of 512 × 512. For a fair comparison, we recommend all researchers to use this resolution for their experiments.

Online Challenge

We only release the training and validation sets of the proposed TBX11K dataset. The test set is retained as an online challenge for simultaneous TB X-ray classification and TB area detection in a single system (e.g., a convolutional neural network). To participate this challenge, you need to create an account on CodaLab and register for the TBX11K Tuberculosis Classification and Detection Challenge. Please refer to this webpage or our paper to see the evaluation metrics. Then, open the “Participate” tab to read the submission guidelines carefully. Next, you can upload your submission. Once uploaded, your submissions will be evaluated automatically. We have added four well-known baseline methods in the leaderboard, including Faster R-CNN (ResNet50) [3], FCOS (ResNet50) [4], RetinaNet (ResNet50) [5], and SSD (VGG16) [6]. Please refer to our paper for details about the reformation of these baselines.

For the evaluation of TB area detection, we adopt MS-COCO API directly. For the evaluation of X-ray classification, we use the functions in the Python package “sklearn” in a way like:

from sklearn.metrics import accuracy_score
from sklearn.metrics import recall_score
from sklearn.metrics import precision_score
from sklearn.metrics import multilabel_confusion_matrix
from sklearn.metrics import roc_auc_score

Terms of Use

This dataset belongs to the Media Computing Lab at Nankai University and is licensed under a Creative Commons Attribution 4.0 License.

References

[1] Jaeger, S., Candemir, S., Antani, S., Wáng, Y.X.J., Lu, P.X. and Thoma, G., 2014. Two public chest X-ray datasets for computer-aided screening of pulmonary diseases. Quantitative imaging in medicine and surgery, 4(6), p.475.

[2] Chauhan, A., Chauhan, D. and Rout, C., 2014. Role of Gist and PHOG features in computer-aided diagnosis of tuberculosis without segmentation. PloS one, 9(11), p.e112980.

[3] Ren, S., He, K., Girshick, R. and Sun, J., 2015. Faster R-CNN: Towards real-time object detection with region proposal networks. In Advances in neural information processing systems (pp. 91-99).

[4] Tian, Z., Shen, C., Chen, H. and He, T., 2019. FCOS: Fully convolutional one-stage object detection. In Proceedings of the IEEE international conference on computer vision (pp. 9627-9636).

[5] Lin, T.Y., Goyal, P., Girshick, R., He, K. and Dollár, P., 2017. Focal loss for dense object detection. In Proceedings of the IEEE international conference on computer vision (pp. 2980-2988).

[6] Liu, W., Anguelov, D., Erhan, D., Szegedy, C., Reed, S., Fu, C.Y. and Berg, A.C., 2016, October. SSD: Single shot multibox detector. In European conference on computer vision (pp. 21-37). Springer, Cham.

Frequently Asked Questions

1. In the paper, it says that all X-rays are in the resolution of about 3000 × 3000, but I download the dataset and the X-ray resolution is 512 × 512. Why?

As explained above, the original 3000 × 3000 images will lead to a storage size of over 100GB, which is too large to deliver. On the other hand, we found that the resolution of 512 × 512 is enough to train deep models for TB detection and classification. In addition, it is almost impossible to directly use the 3000 × 3000 X-ray images for TB detection due to the limited receptive fields of the existing CNNs. Therefore, we decide to only release the X-rays with the resolution of 512 × 512. For a fair comparison, we recommend all researchers to use this resolution for their experiments.

2. What is the format of the bounding box annotations?

For the xml format annotations, we provide [xmin, ymin, xmax, ymax]; while the json format is the same as COCO, i.e., [x, y, width, height]. This can be seen from our code: code/make_json_anno.py

3. I could see that you have used the category_id = 3 for “PulmonaryTuberculosis”. However, I could see no images categorized with this ID. There are only category IDs 1 and 2. Could you please explain this?

The category of “PulmonaryTuberculosis”, i.e., category_id = 3, indicates the unknown TB X-rays in our paper. Note that unknown TB X-rays are all in the test set whose annotations are not released and reserved as an online challenge. We only use unknown TB X-rays for the evaluation of category-agnostic TB area detection. Hence, when you build your model, please only use categories with category_id = 1 and category_id = 2.

(Visited 24,637 times, 1 visits today)

31 Comments

Inline Feedbacks

View all comments

dskim

7 months ago

hi
I want to participate in the 7916, but an error message occurs after submitting the results to the codalab evaluation server.

the error message is ModuleNotFoundError: No module named ‘pycocotools’

please add pycotools inside the competition docker image.

Yun Liu

Reply to dskim

Thank you for your comment. The problem has been fixed, and the competition is ready now.

Amir Rajak

1 year ago

Hi again. Could you provide the raw dicom files of this dataset ? Our model which performs very well on some unseen private dataset, has very low specificity on your dataset. And if we had the raw dicom files we’d be able to replicate our exact preprocessing steps.

first of all thanks for this amazing work. I have a confusion regarding the annotation. In the JSON file I see you have three categories of TB:

ActiveTuberculosis
ObsoletePulmonaryTuberculosis
PulmonaryTuberculosis

But on the paper you have reported as Active and Latent TBs. So could you please clarify which of the three TBs do you mean by Latent TB ?

Author

Reply to Amir Rajak

ObsoletePulmonaryTuberculosis: Latent TB

Reply to Yun Liu

Thanks for a quick response. 🙂

mfarnas

2 years ago

Thank you for this great work. I understand that the original DICOM images cannot be delivered due to the limited storage space. However, there are still some valuable information that can be obtained from the DICOM metadata (e.g. gender, age, position, etc). Can these information be shared in a file(s)?

MM Cheng

Admin

Reply to mfarnas

Thanks for your valuable suggestion. We will prepare for it and let you know when it is released.

kj172

3 years ago

请问您的这个res2net和resnet34在分割的性能上比较了吗？哪个更好一点，

Reply to kj172

您的问题和此项目无关，请您到res2net的主页下提问。

Kafka

Y’all are doing incredible work. Thank you! I’d banged my head against a wall trying to contact hospitals in the middle of a pandemic for coming up with agreements. This is seriously awesome

Reply to Kafka

Thank you very much!

Future suggestion: It’d be very cool if y’all could expand a bit more on the potential research section. Are there any training methods etc that the team would like other teams to pick up on? We’d be very interested to know!

Since the data distribution of this dataset is highly imbalanced (for example, there are much fewer latent TB cases than active TB cases), we suggest considering the data imbalance carefully in the model training.

Hyunsuk Yoo

Thank you everyone for this great work, and also sharing the dataset publicly. As a doctor, I have some questions regarding how the GT labels were created.

Reply to Hyunsuk Yoo

(1) Are latent TB cases also biologically confirmed (for example by tuberculin testing or IFNg testing?)

Here are my questions:

(1) Are latent TB cases biologically confirmed? (by IFNg testing or tuberculin skin testing)
(2) If the cases are biologically positive for active TB, but does has CXRs regions suspicious for latent TB only, are they labeled as latent TB or active TB?
(3) If the cases are biologically positive for active TB, but does not contain CXRs regions that are not suspicious for active TB, how are they labeled?

Thanks for your questions!

(1) Are latent TB cases biologically confirmed? (by IFNg testing or tuberculin skin testing)

Yes, they are. Both active and latent TB cases are biologically confirmed using the hospitals’ accurate clinical diagnosis technology, of course, in the image level.

(2) If the cases are biologically positive for active TB, but does has CXRs regions suspicious for latent TB only, are they labeled as latent TB or active TB?

As clarified in section 3.1.3, the annotation is conducted under a double-check rule: “Specifically, each TB X-ray is first labeled by a radiologist who has 5-10 years of experience in TB diagnosis. Then, his box annotations are further checked by another radiologist who has >10 years of experience in TB diagnosis. They not only label bounding boxes for TB areas but also recognize the TB type (active or latent TB) for each box. The labeled TB types are double-checked to make sure that they are consistent with the image-level labels produced by the golden standard. If a mismatch happens, this X-ray will be put into the unlabeled data for re-annotation, and the annotators do not know which X-ray was labeled wrong before. If an X-ray is incorrectly labeled twice, we will tell the annotators the gold standard of this X-ray and ask them to discuss how to re-annotate it.” Therefore, the final TB type must be consistent with the golden standard.

(3) If the cases are biologically positive for active TB, but does not contain CXRs regions that are not suspicious for active TB, how are they labeled?

The answer is similar to that of the above (2). The image-level labels are confirmed using the hospitals’ accurate clinical diagnosis technology and thus reliable. In our annotation process, our experienced radiologists did not happen to the situation that you said, after discussion, the annotation still cannot be consistent with the gold standard.

(4) How can the cases be both active TB and latent TB?

Of course, one TB region is either active TB or latent TB. However, note that an X-ray would contain both active TB and latent TB regions. In our dataset, each TB box only has one label of being active or latent, but a TB X-ray would have both active TB and latent TB labels.

Dr. Rajaraman

That was a great work. I need clarity about the bounding box annotations you have released. For instance, for a given image, tb/tb0003.png, you have given the following annotations: [259.68731689453125 44.277679443359375 101.13803100585938 138.91192626953125]. What is the order of this bounding box? is it [xmin, xmax, ymin, max] or [x,y,width,height]? i cant find these details in the JSON or the pdf as well. Kindly clarity ASAP.

Reply to Dr. Rajaraman

Thanks for your interest. For the xml format annotations, we provide [xmin, ymin, xmax, ymax]; while the json format is the same as COCO, i.e., [x, y, width, height]. This can be seen from our code: code/make_json_anno.py

Thanks a bunch for your response. I have one more question. I could see that you have used the category_id = 3 for ‘PulmonaryTuberculosis’. However, i could see no images categorized with this id. There are only category ids 1 and 2. Could you please explain this?

Last edited 3 years ago by Dr. Rajaraman

jingjing.yin

Hello,In the paper ,it says that all r-rays are in the resolution of about 3000*3000,but I download the dataset from baiduyun and the picture’s resolution is about 512*512.So,how can I get the dataset in the resolution of about 3000*3000?

Reply to jingjing.yin

The original 3000 * 3000 images will lead to a storage size of over 100GB, which is too large to deliver. On the other hand, we found that the resolution of 512 * 512 is enough to train deep models for TB detection and classification. In addition, it is almost impossible to directly use the 3000 * 3000 X-ray images for TB detection due to the limited receptive fields of the existing CNNs. Therefore, we decide to only release the X-rays with the resolution of 512 * 512. For a fair comparison (an evaluation server will be provided these days), we recommend all researchers to use this resolution for their experiments.

Thanks for explaining it to me,now I understand it.Thank you again.

Hey, the online challenge for the test set has been started.

Davis Jin

Can you provide the url of the dataset TBX11K?

Reply to Davis Jin

Hey, I will update this page and upload the data these days. I will reply here to remind you.

hoangnguyen

I am also interested in your dataset, when you can public it?

Reply to hoangnguyen

Hey, we have released the data, and we will keep updating this page and provide an evaluation server for the test set.

wpDiscuz