Improving Convolutional Networks with Self-calibrated Convolutions
Jiang-Jiang Liu1*, Qibin Hou2*, Ming-Ming Cheng1, Changhu Wang3, Jiashi Feng2
1CS, Nankai University 2NUS 3ByteDance AI Lab
1. Abstract
Recent advances on CNNs are mostly devoted to design- ing more complex architectures to enhance their representation learning capacity. In this paper, we consider improv- ing the basic convolutional feature transformation process of CNNs without tuning the model architectures. To this end, we present a novel self-calibrated convolution that explicitly expands fields-of-view of each convolutional layer through internal communications and hence enriches the output features. In particular, unlike the standard convolutions that fuse spatial and channel-wise information using small kernels (e.g., 3 × 3), our self-calibrated convolution adaptively builds long-range spatial and inter-channel dependencies around each spatial location through a novel self-calibration operation. Thus, it can help CNNs generate more discriminative representations by explicitly incorporating richer information. Our self-calibrated convolution design is simple and generic, and can be easily applied to augment standard convolutional layers without introducing extra parameters and complexity. Extensive experiments demonstrate that when applying our self-calibrated convolution into different backbones, the baseline models can be significantly improved in a variety of vision tasks, including image recognition, object detection, instance segmentation, and keypoint detection, with no need to change net- work architectures. We hope this work could provide future research with a promising way of designing novel convolutional feature transformation for improving convolutional networks.
2. Paper
Improving Convolutional Networks with Self-Calibrated Convolutions, Jiang-Jiang Liu, Qibin Hou, Ming-Ming Cheng, Changhu Wang, Jiashi Feng, IEEE CVPR, 2020. (*Equal contribution) [pdf|project|bib|code]
@inproceedings{liu2020scnet,
title={Improving Convolutional Networks with Self-Calibrated Convolutions},
author={Jiang-Jiang Liu and Qibin Hou and Ming-Ming Cheng and Changhu Wang and Jiashi Feng},
booktitle={IEEE CVPR},
year={2020},
}
3. Applications
Update:
- 2020.5.15
- Pretrained model of SCNet50_v1d with more than 2% improvement on ImageNet top1 acc (80.47 v.s. 77.81). compared with original version of SCNet-50 is released!
- SCNet50_v1d achieves comparable performance on other applications such as object detection and instance segmentation to our original SCNet101 version.
- Because of limited GPU resources, the pretrained model of SCNet101_v1d will be released later, as well as more applications’ results.
3.1 Classification
SC-Conv module can directly replace the bottleneck block with no other modification. Source code is available at https://github.com/MCG-NKU/SCNet .
model | #Params | MAdds | FLOPs | top-1 error | top-5 error | Link 1 | Link 2 |
---|---|---|---|---|---|---|---|
SCNet-50 | 25.56M | 4.0G | 7.9G | 22.19 | 6.08 | GoogleDrive | BaiduYun pwd: 95p5 |
SCNet-50_v1d | 25.58M | 4.7G | 9.4G | 19.53 | 4.68 | GoogleDrive | BaiduYun pwd: hmmt |
SCNet-101 | 25.70M | 7.2G | 14.4G | 21.06 | 5.75 | GoogleDrive | BaiduYun pwd: 38oh |
3.2 Object detection
We use Faster R-CNN architecture with feature pyramid networks (FPNs) as baselines. We adopt the widely used mmdetection framework to run all our experiments.
Backbone | AP | AP.5 | AP.75 | APs | APm | APl |
---|---|---|---|---|---|---|
ResNet-50 | 37.6 | 59.4 | 40.4 | 21.9 | 41.2 | 48.4 |
SCNet-50 | 40.8 | 62.7 | 44.5 | 24.4 | 44.8 | 53.1 |
SCNet-50_v1d | 41.8 | 62.9 | 45.5 | 24.8 | 45.3 | 54.8 |
ResNet-101 | 39.9 | 61.2 | 43.5 | 23.5 | 43.9 | 51.7 |
SCNet-101 | 42.0 | 63.7 | 45.5 | 24.4 | 46.3 | 54.6 |
3.3 Instance segmentation
We use Mask R-CNN architecture with feature pyramid networks (FPNs) as baselines. We adopt the widely used mmdetection framework to run all our experiments.
Backbone | AP | AP50 | AP75 | APs | APm | APl |
---|---|---|---|---|---|---|
ResNet-50 | 35.0 | 56.5 | 37.4 | 18.3 | 38.2 | 48.3 |
SCNet-50 | 37.2 | 59.9 | 39.5 | 17.8 | 40.3 | 54.2 |
SCNet-50_v1d | 38.5 | 60.6 | 41.3 | 20.8 | 42.0 | 52.6 |
ResNet-101 | 36.7 | 58.6 | 39.3 | 19.3 | 40.3 | 50.9 |
SCNet-101 | 38.4 | 61.0 | 41.0 | 18.2 | 41.6 | 56.6 |
3.4 Human keypoint detection
We use Simple Baselines as the baseline method for human keypoint detection. A Faster R-CNN object detector with detection AP of 56.4 for the ‘person’ category on COCO val2017 set is adopted for detection in the test phase.
Backbone | Scale | AP | Ap .5 | AP .75 | AP (M) | AP (L) |
---|---|---|---|---|---|---|
ResNet-50 | 256×192 | 70.6 | 88.9 | 78.2 | 67.2 | 77.4 |
SCNet-50 | 256×192 | 72.1 | 89.4 | 79.8 | 69.0 | 78.7 |
ResNet-50 | 384×288 | 71.9 | 89.2 | 78.6 | 67.7 | 79.6 |
SCNet-50 | 384×288 | 74.4 | 89.7 | 81.4 | 70.7 | 81.7 |
ResNet-101 | 256×192 | 71.6 | 88.9 | 79.3 | 68.5 | 78.2 |
SCNet-101 | 256×192 | 72.6 | 89.4 | 80.4 | 69.4 | 79.4 |
ResNet-101 | 384×288 | 73.9 | 89.6 | 80.5 | 70.3 | 81.1 |
SCNet-101 | 384×288 | 74.8 | 89.6 | 81.8 | 71.2 | 81.9 |