Pixel level image understanding
VALSE 2018 Workshop 四月20日(周五)上午 (8:00 – 12:00) 星海会展中心8号会议室
Workshop组织者
报告安排
- 程明明:Learning Pixel Accurate Image Semantics from Web
- 刘偲:图像理解与编辑
- 魏云超:Towards Weakly- and Semi-Supervised Object Localization and Semantic Segmentation
- 王兴刚:Weakly-Supervised Semantic Segmentation Network with Deep Seeded Region Growing
- 董超:Semantic image super-resolution and Reinforcement learning based image restoration
相关论文
- Deeply supervised salient object detection with short connections, Q Hou, MM Cheng, X Hu, Z Tu, A Borji, Z Tu, P Torr, IEEE CVPR, 2017. (华为Mate 10, 荣耀V 10产品发布会展示)
- Surveillance Video Parsing with Single Frame Supervision, S Liu, C Wang, R Qian, H Yu, R Bao, Y Sun, CVPR 2017(第一个在监控视频中做人像解析的实时算法)
- Richer Convolutional Features for Edge Detection, Y Liu, MM Cheng, X Hu, K Wang, X Bai, IEEE CVPR, 2017. (第一个在最流行的BSD500数据集上超过人工标注的实时算法)
- Global Contrast based Salient Region Detection. Ming-Ming Cheng, Niloy J. Mitra, Xiaolei Huang, Philip H. S. Torr, Shi-Min Hu. IEEE TPAMI, 2015. (2000+次他引)
- Image Super-Resolution Using Deep Convolutional Networks, C Dong, C Loy, K He, X Tang, IEEE TPAMI, 2016. (600+次他引)
- Pyramid Scene Parsing Network, H Zhao, J Shi, X Qi, X Wang, J Jia. IEEE CVPR, 2017. (ImageNet场景理解竞赛冠军,130+次他引)
- STC: A Simple to Complex Framework for Weakly-supervised Semantic Segmentation. TPAMI 2017
- Object Region Mining with Adversarial Erasing: A Simple Classification to Semantic Segmentation Approach. CVPR 2017
- Transferable Semi-supervised Semantic Segmentation. AAAI 2018
- Revisiting Dilated Convolution: A Simple Approach for Weakly- and Semi-Supervised Semantic Segmentation. CVPR 2018
- Adversarial Complementary Learning for Weakly Supervised Object Localization. CVPR 2018
- Weakly-Supervised Semantic Segmentation Network with Deep Seeded Region Growing. CVPR 2018.
报告摘要及讲者简介
Learning Pixel Accurate Image Semantics from Web
Abstract: Understanding pixel level image semantic is the foundation of many important computer vision and computer graphics applications. Although the related research has achieved rapid development in recent years, the existing state-of-the-art solutions heavily dependent on mass and pixel accurate image annotation. In contrast, humans can autonomously learn how to perform high-precision semantic recognition and target extraction without difficulty through online search. Inspired by this phenomenon, we started with the category-independent semantic feature extraction techniques such as saliency object detection, image segmentation, and edge extraction. Next, we will introduce how to use this category-independent image semantic feature to reduce the reliance on accurate annotation in the semantic learning process, and then implement an image semantic understanding technique that does not require any explicate manual annotation.
Speaker: Ming-Ming Cheng is a professor at Nankai University. He received his Ph.D. degree from Tsinghua University in 2012. Then he worked as a research fellow for 2 years, working with Prof. Philip Torr in Oxford. Dr. Cheng’s research primarily focuses on algorithmic issues in image understanding and processing, including salient object detection, semantic segmentation, low-level vision techniques, image manipulation, etc. He has published over 30 papers in leading journals and conferences, such as IEEE TPAMI, ACM TOG, ACM SIGGRAPH, IEEE CVPR, and IEEE ICCV. He has designed a series of popular methods and novel systems, indicated by 8000+ paper citations (2000+ citations to his first author paper on salient object detection). His work has been reported by several famous international media, such as BBC, UK Telegraph, Der Spiegel, and Huffington Post.
图像理解与编辑
摘要:近年来,基于深度学习的图像视频分析技术取得了巨大成功。相比于传统的物体分类识别技术,图像的像素级语义理解,又称语义分割,能提供更加丰富的像素级信息, 因而成为一个新的研究热点。本报告以语义分割的三个典型实例,即场景解析,人脸解析以及人像解析为切入点,重点介绍我们针对语义分割的以下两个挑战做出的工作。1:减少人工标注工作量:在很多实用场景中,图像尺寸大且标签种类繁多,纯人工逐像素标注非常昂贵且低效。我们提出一系列在不降低算法精度的前提下,极大减少人工标注成量的无监督、半监督、弱监督语义分割算法。2:提升分割精度: 通过综合考虑上下文信息,如语义标签之间的共生性和互斥性,不同信息源的互补性,极大地改进了分割精度。最后,我们也将展示语义分割在智能相机、视频监控、智能家居、电商平台搜索等多个领域的应用效果。
讲者:刘偲 中国科学院信息工程研究所副研究员。本科毕业于北京理工大学校级实验班,博士毕业于中科院自动化所,曾于新加坡国立大学任研究助理及博士后。其研究领域包括计算机视觉和多媒体分析。刘偲以图像视频中的人物分析为切入点,开展相关研究并形成了较为完整的体系。2017-2019年中科协青年人才托举工程入选者,微软亚洲研究院铸星计划研究员,CCF-腾讯犀牛鸟科研基金获得者。
Towards Weakly- and Semi-Supervised Object Localization and Semantic Segmentation
Abstract: Over the past few years, the great success of CNNs in object detection and image semantic segmentation relies on a large number of human annotations. However, collecting annotations such as bounding boxes and segmentation masks is very costly. To relieve the demand of finance and human effort, in this talk, Dr. Yunchao Wei will introduce his recent works, which utilize weak information as supervision to address more challenging object localization and semantic segmentation tasks. In particular, he proposes several novel solutions to produce dense object localization maps only using image-level labels as supervision. The dense object localization maps can successfully build the relationship between image-level labels and pixels, and effectively boost the accuracy of localization and segmentation tasks. His works are published in top-tier journals/conferences (e.g. T-PAMI and CVPR) and achieve state-of-the-art performance.
Speaker: Yunchao Wei is currently a Postdoctoral Researcher in Beckman Institute at the University of Illinois at Urbana-Champaign, working with Prof. Thomas Huang. He received his Ph.D. degree from Beijing Jiaotong University in 2016, advised by Prof. Yao Zhao. He received Excellent Doctoral Dissertation Awards of Chinese Institute of Electronics (CIE) and Beijing Jiaotong University in 2016, the Winner prize of the object detection task (1a) in ILSVRC 2014, the Runner-up prizes of all the video object detection tasks in ILSVRC 2017. His current research interest focuses on computer vision techniques for large-scale data analysis. Specifically, he has done work in weakly- and semi-supervised object recognition, multi-label image classification, video object detection and multi-modal analysis.
Weakly-Supervised Semantic Segmentation Network with Deep Seeded Region Growing
Abstract: Recent state-of-the-art methods on this problem first infer the sparse and discriminative regions using deep classification networks for each object class, then train semantic segmentation networks using the discriminative regions as supervision. Inspired by the traditional image segmentation methods of seeded region growing, we propose to train semantic segmentation networks starting from the discriminative regions and progressively increase the pixel-level supervision using the idea of seeded region growing. The seeded region growing module is integrated in a deep segmentation network and beneficial from deep features. Different from conventional deep networks which has fixed/static supervision, the proposed weakly-supervised network produces some labels for its input data using the contextual information within an image. The proposed method significantly outperforms the weakly-supervised semantic segmentation methods using static supervision, and obtains the state-of-the-art performance on both PASCAL VOC 2012 and COCO, which are 63.2% mIoU score on the PASCAL VOC 2012 test set and 26.0% mIoU score on the COCO dataset.
讲者:王兴刚 ,华中科技大学,电子信息与通信学院,讲师。主要研究方向为计算机视觉和机器学习,尤其在于目标检测和深度学习。分别于2009年和2014年在华中科技大学获得学士和博士学位。迄今在发表学术论文50余篇,其中包括国际顶级会议(ICML, NIPS, CVPR, ICCV, ECCV.)及期刊(IEEE TIP, Information Sciences, Pattern Recognition, Neural Computation etc.)。谷歌学术(Google Scholar)引用次数超过1000次。2012年获“微软学者”奖;2015年入选中国科协“青年托举人才工程”。
Reinforcement learning based image restoration and Semantic image super-resolution
Abstract: Introduce two of our recent works (published on CVPR2018) on low-level vision problems. In the first paper, we investigate a novel approach for image restoration by reinforcement learning. Unlike existing studies that mostly train a single large network for a specialized task, we prepare a toolbox consisting of small-scale convolutional networks of different complexities and specialized in different tasks. Our method, RL-Restore, then learns a policy to select appropriate tools from the toolbox to progressively restore the quality of a corrupted image. In comparison to conventional human-designed networks, RL-Restore is capable of restoring images corrupted with complex and unknown distortions in a more parameter-efficient manner using the dynamically formed toolchain. In the second paper, we solve the problem of semantic super-resolution and show that it is possible to recover textures faithful to semantic classes. In particular, we only need to modulate features of a few intermediate layers in a single network conditioned on semantic segmentation probability maps. This is made possible through a novel Spatial Feature Modulation (SFM) layer that generates affine transformation parameters for spatial-wise feature modulation. Our final results show that an SR network equipped with SFM can generate more realistic and visually pleasing textures in comparison to state-of-the-arts.
讲者:董超,现为商汤研究院高级研究经理,图像视频画质团队负责人,致力于图像和视频的超分辨率、去噪、增强等算法研发和产品落地。本科毕业于北京理工大学,博士毕业于香港中文大学,导师为汤晓鸥教授和吕建勤教授。2014年和2015年分别首次将深度学习应用在图像超分辨领域和图像去压缩领域,2018年首次将强化学习应用在图像复原领域。论文总他引次超过1500次,其中期刊论文SRCNN在2016年3月-8月间被选为TPAMI最受欢迎论文(“Most Popular Article”)。2016年获得香港中文大学优秀博士论文提名。2017年带队参加CVPR图像超分辨竞赛NTIRE2017获得第2名。同年组建商汤画质团队,致力于将基于深度学习的图像处理算法在实际产品中落地。2017年-2018年间,相继为国内知名手机厂商提供单帧图像超分辨率算法和人像美颜算法,并开发了基于多帧输入的真实拍照场景下4k图像画质综合解决方案。