Depth-aware Neural Style Transfer

对不起,此内容只适用于 美式英文。 For the sake of viewer convenience, the content is shown below in the alternative language. You may click the link to switch the active language.


Neural style transfer has recently received significant attention and demonstrated amazing results. An efficient solution proposed by Johnson et al. trains feed-forward convolutional neural networks by defining and optimizing perceptual loss functions. Such methods are typically based on high-level features extracted from pre-trained neural networks, where the loss functions contain two components: style loss and content loss. However, such pre-trained networks are originally designed for object recognition, and hence the high-level features often focus on the primary target and neglect other details. As a result, when input images contain multiple objects potentially at different depths, the resulting images are often unsatisfactory because image layout is destroyed and the boundary between the foreground and background as well as different objects becomes obscured. We observe that the depth map effectively reflects the spatial distribution in an image and preserving the depth map of the content image after stylization helps produce an image that preserves its semantic content. In this paper, we introduce a novel approach for neural style transfer that integrates depth preservation as additional loss, preserving overall image layout while performing style transfer.


  • Depth-aware Neural Style Transfer,  XC Liu, MM Cheng, YK Lai, PL Rosin, NPAR, 2017. [pdf] [official version] [bib]


Image style transfer results. (a) style image, (b) content image, (c) result from Stanford Vision Lab [24] and (d) our depth-aware style transfer result. We can see that when stylizing an image with rich relative depth and spatial distance information, compared to [24], our results can better keep the original layout and relative depth relationships.

System Overview

System Overview. We train an image transformation network to transform the input images. We use a loss network pre-trained for object recognition to define style and content loss, and an additional depth estimation network to define depth loss. In the training stage, for a specific style, we obtain the corresponding style transfer model through optimizing the total loss.