Res2Net: A New Multi-scale Backbone Architecture

07/08/2019 Shang-Hua Gao

Online Demo

Shanghua Gao ¹, Ming-Ming Cheng ¹, Kai Zhao ¹, Xin-Yu Zhang ¹, Ming-Hsuan Yang ², Philip Torr³
¹TKLNDST, CS, Nankai University ²UC, Merced ³University of Oxford

Figure 1. We propose a novel building block for CNNs, namely Res2Net, by constructing hierarchical residual-like connections within one single residual block. The Res2Net represents multi-scale features at a granular level and increases the range of receptive fields for each network layer. The proposed Res2Net block can be plugged into the state-of-the-art backbone CNN models, e.g., ResNet, ResNeXt, BigLittleNet, and DLA. We evaluate the Res2Net block on all these models and demonstrate consistent performance gains over baseline models.

1. Abstract

Representing features at multiple scales is of great importance for numerous vision tasks. Recent advances in backbone convolutional neural networks (CNNs) continually demonstrate stronger multi-scale representation ability, leading to consistent performance gains on a wide range of applications. However, most existing methods represent the multi-scale features in a layerwise manner. In this paper, we propose a novel building block for CNNs, namely Res2Net, by constructing hierarchical residual-like connections within one single residual block. The Res2Net represents multi-scale features at a granular level and increases the range of receptive fields for each network layer. The proposed Res2Net block can be plugged into the state-of-the-art backbone CNN models, e.g. ResNet, ResNeXt, and DLA. We evaluate the Res2Net block on all these models and demonstrate consistent performance gains over baseline models on widely-used datasets, e.g. CIFAR-100 and ImageNet. Further ablation studies and experimental results on representative computer vision tasks, i.e. object detection, class activation mapping, and salient object detection, further verify the superiority of the Res2Net over the state-of-the-art baseline methods.

Source Code and pre-trained model: https://github.com/Res2Net

2. Paper

Res2Net: A New Multi-scale Backbone Architecture, Shang-Hua Gao#, Ming-Ming Cheng#*, Kai Zhao, Xin-Yu Zhang, Ming-Hsuan Yang, Philip Torr, IEEE TPAMI, 43(2):652-662, 2021. [pdf | code | project |PPT | bib | 中译版|LaTeX ]

3. Applications

Res2Net is found to be useful in almost all computer vision applications we have tried so far. If you found it useful in your applications and want to share with others, please contact us to add a link in this project page.

News：

2020.10.20 PaddlePaddle version Res2Net achieves 85.13% top-1 acc. on ImageNet: PaddlePaddle Res2Net.
2020.8.21 Online demo for detection and segmentation using Res2Net is released http://mc.nankai.edu.cn/res2net-det
2020.7.29 The training code of Res2Net on ImageNet is released https://github.com/Res2Net/Res2Net-ImageNet-Training (non-commercial use only)
2020.6.1 Res2Net is now in the official model zoo of the new deep learning framework Jittor.
2020.5.21 Res2Net is now one of the basic bonebones in MMDetection v2 framework https://github.com/open-mmlab/mmdetection. Using MMDetection v2 with Res2Net achieves better performance with less computational cost.
2020.5.11 Res2Net achieves about 2% performance gain on Panoptic Segmentation based on detectron2 with no trick.
2020.3.14 Res2Net backbone allows latest interactive segmentation method to significantly reduce the number of required user interactions compared with best reported results.
2020.2.24 Our Res2Net_v1b achieves better detection performance on popular mmdetection platform, outperforming previous best results achieved by HRNet backbone, while consuming only about 50% of parameters and computings!
2020.2.21: Pretrained models of Res2Net_v1b with more than 2% improvement on ImageNet top1 acc. compared with PAMI version of Res2Net!

3.1 Classification

Res2Net module can replace the bottleneck block with no other modification.

We have implemented the Res2Net module into many state-of-the-art backbone networks: ResNet, ResNeXt, DLA, SE-NET, DLA, Big-Little Net. Source codes of those backbone models are available at https://github.com/gasvn/Res2Net .

model	#Params	GFLOPs	top-1 error	top-5 error	Link
Res2Net-50-48w-2s	25.29M	4.2	22.68	6.47	OneDrive
Res2Net-50-26w-4s	25.70M	4.2	22.01	6.15	OneDrive
Res2Net-50-14w-8s	25.06M	4.2	21.86	6.14	OneDrive
Res2Net-50-26w-6s	37.05M	6.3	21.42	5.87	OneDrive
Res2Net-50-26w-8s	48.40M	8.3	20.80	5.63	OneDrive
Res2Net-101-26w-4s	45.21M	8.1	20.81	5.57	OneDrive
Res2NeXt-50	24.67M	4.2	21.76	6.09	OneDrive
Res2Net-DLA-60	21.15M	4.2	21.53	5.80	OneDrive
Res2NeXt-DLA-60	17.33M	3.6	21.55	5.86	OneDrive
Res2Net-v1b-50	25.72M	4.5	19.73	4.96	Link
Res2Net-v1b-101	45.23M	8.3	18.77	4.64	Link

The download link from Baidu Disk is now available. (Baidu Disk password: vbix)

3.2 Pose estimation

Pose Estimation Task requires localization of person keypoints in challenging, uncontrolled conditions. This task involves simultaneously detecting people *and* localizing their keypoints.

We use Simple Baselines as the baseline method for Pose Estimation. Source code is available at https://github.com/gasvn/Res2Net-Pose-Estimation .

Results on COCO val2017 with detector having human AP of 56.4 on COCO val2017 dataset

Arch	Input size	AP	Ap .5	AP .75	AP (M)	AP (L)
pose_resnet_50	256×192	0.704	0.886	0.783	0.671	0.772
pose_res2net_50	256×192	0.737	0.925	0.814	0.708	0.782
pose_resnet_101	256×192	0.714	0.893	0.793	0.681	0.781
pose_res2net_101	256×192	0.744	0.926	0.826	0.720	0.785

3.3 Instance segmentation

We use MaskRCNN as the baseline method for Instance segmentation and Object detection. We use the maskrcnn-benchmark as the baseline. Source code is available at https://github.com/gasvn/Res2Net-maskrcnn .

Instance segmentation is the combination of object detection and semantic segmentation. It requires not only the correct detection of objects with various sizes in an image but also the precise segmentation of each object.

Performance on Instance segmentation:

Backbone	Setting	AP	AP50	AP75	APs	APm	APl
ResNet-50	64w	33.9	55.2	36.0	14.8	36.0	50.9
ResNet-50	48w×2s	34.2	55.6	36.3	14.9	36.8	50.9
Res2Net-50	26w×4s	35.6	57.6	37.6	15.7	37.9	53.7
Res2Net-50	18w×6s	35.7	57.5	38.1	15.4	38.1	53.7
Res2Net-50	14w×8s	35.3	57.0	37.5	15.6	37.5	53.4
ResNet-101	64w	35.5	57.0	37.9	16.0	38.2	52.9
Res2Net-101	26w×4s	37.1	59.4	39.4	16.6	40.0	55.6

3.4 Object detection

Performance on Object detection:

Backbone	Setting	AP	AP50	AP75	APs	APm	APl
ResNet-50	64w	37.5	58.4	40.3	20.6	40.1	49.7
ResNet-50	48w×2s	38.0	58.9	41.3	20.5	41.0	49.9
Res2Net-50	26w×4s	39.6	60.9	43.1	22.0	42.3	52.8
Res2Net-50	18w×6s	39.9	60.9	43.3	21.8	42.8	53.7
Res2Net-50	14w×8s	39.1	60.2	42.1	21.7	41.7	52.8
ResNet-101	64w	39.6	60.6	43.2	22.0	43.2	52.4
Res2Net-101	26w×4s	41.8	62.6	45.6	23.4	45.5	55.6

3.5 Salient object detection

Precisely locating the salient object regions in an image requires an understanding of both large-scale context information for the determination of object saliency, as well as small-scale features to localize object boundaries accurately.

We use PoolNet (cvpr19) as the baseline method for Salient Object Detection . Source code is available at https://github.com/gasvn/Res2Net-PoolNet .

Results on salient object detection datasets without joint training with edge. Models are trained using DUTS-TR.

Backbone	ECSSD	PASCAL-S	DUT-O	HKU-IS	SOD	DUTS-TE
–	MaxF & MAE	MaxF & MAE	MaxF & MAE	MaxF & MAE	MaxF & MAE	MaxF & MAE
vgg	0.936 & 0.047	0.857 & 0.078	0.817 & 0.058	0.928 & 0.035	0.859 & 0.115	0.876 & 0.043
resnet50	0.940 & 0.042	0.863 & 0.075	0.830 & 0.055	0.934 & 0.032	0.867 & 0.100	0.886 & 0.040
res2net50	0.947 & 0.036	0.871 & 0.070	0.837 & 0.052	0.936 & 0.031	0.885 & 0.096	0.892 & 0.037

3.6 Segmantic segmentation

Segmantic segmentation results of Deeplab v3+ using ResNet/Res2Net as backbone model.

3.7 Detection benchmark (mmdetection) tasks

Backbone	Params.	GFLOPs	box AP
R-101-FPN	60.52M	283.14	39.4
X-101-64x4d-FPN	99.25M	440.36	41.3
HRNetV2p-W48	83.36M	459.66	41.5
Res2Net-101	61.18M	293.68	42.3

Comparison of Faster R-CNN based detection. The Res2Net based method achieves better results and significantly less computation and memory footprint. See more results for Mask-R-CNN, Cascade R-CNN, Cascade Mask R-CNN, and Hybrid Task Cascade in mmdetection benchmark.

3.8. Vectorized road extraction

Vectorized road extraction from Tan et. al. in CVPR 2020.

3.9 Interactive image segmentation

Interactive image segmentation from Lin et. al. in CVPR 2020, which

To achieve certain accuracy, the number of user interactions required by the new method is nearly half of the previous most powerful method!

3.10 Tumor segmentation on CT scans (from Sun et al. 2019)

Tumor segmentaton on CT scans. From: Sun et al. 2019.

3.11 Person Re-ID (from Cao et al )

Cao et al. use Res2Net to significantly boost performance of ReID applications.

3.12 Single-stage object detection (from Chen et al)

Chen et al. use Res2Net for one-stage object detection for CPU-only devices.

3.13 Depth prediction (from Weida Yang)

Weida Yang uses Res2Net for getting impressive depth detection results.

3.14 Semantic image to photo-realistic image translation

3.15 Res2NetPlus for solar panel detector

A solar panel detector from satellite imagery, developed by Less Wright, who found the Res2Net50 to have both greater accuracy (+5%), and steadier training. See also his blog or a Chinese translated version of Wright’s blog.

3.16 Speaker Verification

Zhou et. al. (IEEE SLT 2021) found that ResNeXt and Res2Net can significantly outperform the conventional ResNet model. The Res2Net model achieved superior performance by reducing the EER by 18.5% relative. Experiments on the other two internal test sets of mismatched conditions further confirmed the generalization of the ResNeXt and Res2Net architectures against noisy environment and segment length variations.

Zhou et. al. (IEEE SLT 2021) found that Res2Net achieves superior performance for processing speech data.

3.17 Protein Structure Prediction

Su et. al (Advanced Science 2021) found that Res2Net achieves super-performance for Protein structure prediction.

(Visited 34,794 times, 2 visits today)