Research

Improving Convolutional Networks with Self-calibrated Convolutions

Jiang-Jiang Liu1*, Qibin Hou2*, Ming-Ming Cheng1, Changhu Wang3, Jiashi Feng2

1CS, Nankai University      2NUS      3ByteDance AI Lab


Figure 1. Schematic illustration of the proposed self-calibrated convolutions. As can be seen, in self-calibrated convolutions, the original filters are separated into four portions, each of which is in charge of a different functionality. This makes self-calibrated convolutions quite different from traditional convolutions or grouped convolutions that are performed in a homogeneous way.

1. Abstract

Recent advances on CNNs are mostly devoted to design- ing more complex architectures to enhance their representation learning capacity. In this paper, we consider improv- ing the basic convolutional feature transformation process of CNNs without tuning the model architectures. To this end, we present a novel self-calibrated convolution that explicitly expands fields-of-view of each convolutional layer through internal communications and hence enriches the output features. In particular, unlike the standard convolutions that fuse spatial and channel-wise information using small kernels (e.g., 3 × 3), our self-calibrated convolution adaptively builds long-range spatial and inter-channel dependencies around each spatial location through a novel self-calibration operation. Thus, it can help CNNs generate more discriminative representations by explicitly incorporating richer information. Our self-calibrated convolution design is simple and generic, and can be easily applied to augment standard convolutional layers without introducing extra parameters and complexity. Extensive experiments demonstrate that when applying our self-calibrated convolution into different backbones, the baseline models can be significantly improved in a variety of vision tasks, including image recognition, object detection, instance segmentation, and keypoint detection, with no need to change net- work architectures. We hope this work could provide future research with a promising way of designing novel convolutional feature transformation for improving convolutional networks.

2. Paper

Improving Convolutional Networks with Self-Calibrated Convolutions, Jiang-Jiang Liu, Qibin Hou, Ming-Ming Cheng, Changhu Wang, Jiashi Feng, IEEE CVPR, 2020. (*Equal contribution) [pdf|project|bib|code]

@inproceedings{liu2020scnet,
 title={Improving Convolutional Networks with Self-Calibrated Convolutions},
 author={Jiang-Jiang Liu and Qibin Hou and Ming-Ming Cheng and Changhu Wang and Jiashi Feng},
 booktitle={IEEE CVPR},
 year={2020},
}

3. Applications

Update:

  • 2020.5.15
    • Pretrained model of SCNet50_v1d with more than 2% improvement on ImageNet top1 acc (80.47 v.s. 77.81). compared with original version of SCNet-50 is released!   
    • SCNet50_v1d achieves comparable performance on other applications such as object detection and instance segmentation to our original SCNet101 version.  
    • Because of limited GPU resources, the pretrained model of SCNet101_v1d will be released later, as well as more applications’ results.

3.1 Classification

SC-Conv module can directly replace the bottleneck block with no other modification. Source code is available at https://github.com/MCG-NKU/SCNet .

model#ParamsMAddsFLOPstop-1 errortop-5 errorLink 1Link 2
SCNet-5025.56M4.0G7.9G22.196.08GoogleDriveBaiduYun pwd: 95p5
SCNet-50_v1d25.58M4.7G9.4G19.534.68GoogleDriveBaiduYun pwd: hmmt
SCNet-10125.70M7.2G14.4G21.065.75GoogleDriveBaiduYun pwd: 38oh
Table 1.Performance of image classification on the ImageNet dataset.

3.2 Object detection

We use Faster R-CNN architecture with feature pyramid networks (FPNs) as baselines. We adopt the widely used mmdetection framework to run all our experiments.

BackboneAPAP.5AP.75APsAPmAPl
ResNet-5037.659.440.421.941.248.4
SCNet-5040.862.744.524.444.853.1
SCNet-50_v1d41.862.945.524.845.354.8
ResNet-10139.961.243.523.543.951.7
SCNet-10142.063.745.524.446.354.6
Table 2.Performance of object detection on the COCO dataset.

3.3 Instance segmentation

We use Mask R-CNN architecture with feature pyramid networks (FPNs) as baselines. We adopt the widely used mmdetection framework to run all our experiments.

BackboneAPAP50AP75APsAPmAPl
ResNet-5035.056.537.418.338.248.3
SCNet-5037.259.939.517.840.354.2
SCNet-50_v1d38.560.641.320.842.052.6
ResNet-10136.758.639.319.340.350.9
SCNet-10138.461.041.018.241.656.6
Table 3.Performance of instance segmentation on the COCO dataset.

3.4 Human keypoint detection

We use Simple Baselines as the baseline method for human keypoint detection. A Faster R-CNN object detector with detection AP of 56.4 for the ‘person’ category on COCO val2017 set is adopted for detection in the test phase.

BackboneScaleAPAp .5AP .75AP (M)AP (L)
ResNet-50256×19270.688.978.267.277.4
SCNet-50256×19272.189.479.869.078.7
ResNet-50384×28871.989.278.667.779.6
SCNet-50384×28874.489.781.470.781.7
ResNet-101256×19271.688.979.368.578.2
SCNet-101256×19272.689.480.469.479.4
ResNet-101384×28873.989.680.570.381.1
SCNet-101384×28874.889.681.871.281.9
Table 4.Performance of human keypoint detection on the COCO dataset.
(Visited 6,188 times, 1 visits today)
Subscribe
Notify of
guest

0 Comments
Inline Feedbacks
View all comments