DemoResearch

Res2Net: A New Multi-scale Backbone Architecture

Online Demo

Shanghua Gao 1, Ming-Ming Cheng 1Kai Zhao 1Xin-Yu Zhang 1, Ming-Hsuan Yang 2, Philip Torr3

1
TKLNDST, CS, Nankai University      2UC, Merced     3University of Oxford

Figure 1. We propose a novel building block for CNNs, namely Res2Net, by constructing hierarchical residual-like connections within one single residual block. The Res2Net represents multi-scale features at a granular level and increases the range of receptive fields for each network layer. The proposed Res2Net block can be plugged into the state-of-the-art backbone CNN models, e.g., ResNet, ResNeXt, BigLittleNet, and DLA. We evaluate the Res2Net block on all these models and demonstrate consistent performance gains over baseline models.

1. Abstract

Representing features at multiple scales is of great importance for numerous vision tasks. Recent advances in backbone convolutional neural networks (CNNs) continually demonstrate stronger multi-scale representation ability, leading to consistent performance gains on a wide range of applications. However, most existing methods represent the multi-scale features in a layerwise manner. In this paper, we propose a novel building block for CNNs, namely Res2Net, by constructing hierarchical residual-like connections within one single residual block. The Res2Net represents multi-scale features at a granular level and increases the range of receptive fields for each network layer. The proposed Res2Net block can be plugged into the state-of-the-art backbone CNN models, e.g. ResNet, ResNeXt, and DLA. We evaluate the Res2Net block on all these models and demonstrate consistent performance gains over baseline models on widely-used datasets, e.g. CIFAR-100 and ImageNet. Further ablation studies and experimental results on representative computer vision tasks, i.e. object detection, class activation mapping, and salient object detection, further verify the superiority of the Res2Net over the state-of-the-art baseline methods.

Source Code and pre-trained model: https://github.com/Res2Net

2. Paper

  1. Res2Net: A New Multi-scale Backbone Architecture, Shang-Hua Gao#, Ming-Ming Cheng#*, Kai Zhao, Xin-Yu Zhang, Ming-Hsuan Yang, Philip Torr, IEEE TPAMI, 43(2):652-662, 2021. [pdf | code | project |PPT | bib | 中译版|LaTeX ]

3. Applications

Res2Net is found to be useful in almost all computer vision applications we have tried so far. If you found it useful in your applications and want to share with others, please contact us to add a link in this project page.

News

  • 2020.10.20 PaddlePaddle version Res2Net achieves 85.13% top-1 acc. on ImageNet: PaddlePaddle Res2Net.
  • 2020.8.21 Online demo for detection and segmentation using Res2Net is released http://mc.nankai.edu.cn/res2net-det
  • 2020.7.29 The training code of Res2Net on ImageNet is released https://github.com/Res2Net/Res2Net-ImageNet-Training (non-commercial use only)
  • 2020.6.1 Res2Net is now in the official model zoo of the new deep learning framework Jittor.
  • 2020.5.21 Res2Net is now one of the basic bonebones in MMDetection v2 framework https://github.com/open-mmlab/mmdetection. Using MMDetection v2 with Res2Net achieves better performance with less computational cost.
  • 2020.5.11 Res2Net achieves about 2% performance gain on Panoptic Segmentation based on detectron2 with no trick.
  • 2020.3.14 Res2Net backbone allows latest interactive segmentation method to significantly reduce the number of required user interactions compared with best reported results.
  • 2020.2.24 Our Res2Net_v1b achieves better detection performance on popular mmdetection platform, outperforming previous best results achieved by HRNet backbone, while consuming only about 50% of parameters and computings!
  • 2020.2.21: Pretrained models of Res2Net_v1b with more than 2% improvement on ImageNet top1 acc. compared with PAMI version of Res2Net!

3.1 Classification

Res2Net module can replace the bottleneck block with no other modification.

We have implemented the Res2Net module into many state-of-the-art backbone networks: ResNet, ResNeXt, DLA, SE-NET, DLA, Big-Little Net. Source codes of those backbone models are available at https://github.com/gasvn/Res2Net .

model#ParamsGFLOPstop-1 errortop-5 errorLink
Res2Net-50-48w-2s25.29M4.222.686.47OneDrive
Res2Net-50-26w-4s25.70M4.222.016.15OneDrive
Res2Net-50-14w-8s25.06M4.221.866.14OneDrive
Res2Net-50-26w-6s37.05M6.321.425.87OneDrive
Res2Net-50-26w-8s48.40M8.320.805.63OneDrive
Res2Net-101-26w-4s45.21M8.120.815.57OneDrive
Res2NeXt-5024.67M4.221.766.09OneDrive
Res2Net-DLA-6021.15M4.221.535.80OneDrive
Res2NeXt-DLA-6017.33M3.621.555.86OneDrive
Res2Net-v1b-5025.72M4.519.734.96Link
Res2Net-v1b-10145.23M8.318.774.64Link

The download link from Baidu Disk is now available. (Baidu Disk password: vbix)

3.2 Pose estimation

Pose Estimation Task requires localization of person keypoints in challenging, uncontrolled conditions. This task involves simultaneously detecting people and localizing their keypoints.

We use Simple Baselines as the baseline method for Pose Estimation. Source code is available at https://github.com/gasvn/Res2Net-Pose-Estimation .

Results on COCO val2017 with detector having human AP of 56.4 on COCO val2017 dataset

ArchInput sizeAPAp .5AP .75AP (M)AP (L)
pose_resnet_50256×1920.7040.8860.7830.6710.772
pose_res2net_50256×1920.7370.9250.8140.7080.782
pose_resnet_101256×1920.7140.8930.7930.6810.781
pose_res2net_101256×1920.7440.9260.8260.7200.785

3.3 Instance segmentation

We use MaskRCNN as the baseline method for Instance segmentation and Object detection. We use the maskrcnn-benchmark as the baseline. Source code is available at  https://github.com/gasvn/Res2Net-maskrcnn .

Instance segmentation is the combination of object detection and semantic segmentation. It requires not only the correct detection of objects with various sizes in an image but also the precise segmentation of each object.

Performance on Instance segmentation:

BackboneSettingAPAP50AP75APsAPmAPl
ResNet-5064w33.955.236.014.836.050.9
ResNet-5048w×2s34.255.636.314.936.850.9
Res2Net-5026w×4s35.657.637.615.737.953.7
Res2Net-5018w×6s35.757.538.115.438.153.7
Res2Net-5014w×8s35.357.037.515.637.553.4
ResNet-10164w35.557.037.916.038.252.9
Res2Net-10126w×4s37.159.439.416.640.055.6

3.4 Object detection

We use MaskRCNN as the baseline method for Instance segmentation and Object detection. We use the maskrcnn-benchmark as the baseline. Source code is available at  https://github.com/gasvn/Res2Net-maskrcnn .

Performance on Object detection:

BackboneSettingAPAP50AP75APsAPmAPl
ResNet-5064w37.558.440.320.640.149.7
ResNet-5048w×2s38.058.941.320.541.049.9
Res2Net-5026w×4s39.660.943.122.042.352.8
Res2Net-5018w×6s39.960.943.321.842.853.7
Res2Net-5014w×8s39.160.242.121.741.752.8
ResNet-10164w39.660.643.222.043.252.4
Res2Net-10126w×4s41.862.645.623.445.555.6

3.5 Salient object detection

Precisely locating the salient object regions in an image requires an understanding of both large-scale context information for the determination of object saliency, as well as small-scale features to localize object boundaries accurately.

We use  PoolNet (cvpr19) as the baseline method for Salient Object Detection . Source code is available at https://github.com/gasvn/Res2Net-PoolNet .

Results on salient object detection datasets without joint training with edge. Models are trained using DUTS-TR.

BackboneECSSDPASCAL-SDUT-OHKU-ISSODDUTS-TE
MaxF & MAEMaxF & MAEMaxF & MAEMaxF & MAEMaxF & MAEMaxF & MAE
vgg0.936 & 0.0470.857 & 0.0780.817 & 0.0580.928 & 0.0350.859 & 0.1150.876 & 0.043
resnet500.940 & 0.0420.863 & 0.0750.830 & 0.0550.934 & 0.0320.867 & 0.1000.886 & 0.040
res2net500.947 & 0.0360.871 & 0.0700.837 & 0.0520.936 & 0.0310.885 & 0.0960.892 & 0.037

3.6 Segmantic segmentation

Segmantic segmentation results of Deeplab v3+ using ResNet/Res2Net as backbone model.

3.7 Detection benchmark (mmdetection) tasks

BackboneParams.GFLOPsbox AP
R-101-FPN60.52M283.1439.4
X-101-64x4d-FPN99.25M440.3641.3
HRNetV2p-W4883.36M459.6641.5
Res2Net-10161.18M293.6842.3
Comparison of Faster R-CNN based detection. The Res2Net based method achieves better results and significantly less computation and memory footprint. See more results for Mask-R-CNN, Cascade R-CNN, Cascade Mask R-CNN, and Hybrid Task Cascade in mmdetection benchmark.

3.8. Vectorized road extraction

Vectorized road extraction from Tan et. al. in CVPR 2020.

3.9 Interactive image segmentation

Interactive image segmentation from Lin et. al. in CVPR 2020, which
To achieve certain accuracy, the number of user interactions required by the new method is nearly half of the previous most powerful method!

3.10 Tumor segmentation on CT scans (from Sun et al. 2019)

Tumor segmentaton on CT scans. From: Sun et al. 2019.

3.11 Person Re-ID (from Cao et al )

Cao et al. use Res2Net to significantly boost performance of ReID applications.

3.12 Single-stage object detection (from Chen et al)

Chen et al. use Res2Net for one-stage object detection for CPU-only devices.

3.13 Depth prediction (from Weida Yang)

Weida Yang uses Res2Net for getting impressive depth detection results.

3.14 Semantic image to photo-realistic image translation

SemanticGAN from Liu et al. 2020.

3.15 Res2NetPlus for solar panel detector

A solar panel detector from satellite imagery, developed by Less Wright, who found the Res2Net50 to have both greater accuracy (+5%), and steadier training. See also his blog or a Chinese translated version of Wright’s blog.

3.16 Speaker Verification

Zhou et. al. (IEEE SLT 2021) found that ResNeXt and Res2Net can significantly outperform the conventional ResNet model. The Res2Net model achieved superior performance by reducing the EER by 18.5% relative. Experiments on the other two internal test sets of mismatched conditions further confirmed the generalization of the ResNeXt and Res2Net architectures against noisy environment and segment length variations.

Zhou et. al. (IEEE SLT 2021) found that Res2Net achieves superior performance for processing speech data.

3.17 Protein Structure Prediction

Su et. al (Advanced Science 2021) found that Res2Net achieves super-performance for Protein structure prediction.

(Visited 34,794 times, 2 visits today)
Subscribe
Notify of
guest

29 Comments
Inline Feedbacks
View all comments