Hi-Fi: Hierarchical Feature Integration for Skeleton Detection
Abstract:
In natural images, the scales (thickness) of object skeletons may dramatically vary among objects and object parts, making object skeleton detection a challenging problem. We present a new convolutional neural network (CNN) architecture by introducing a novel hierarchical feature integration mechanism, named Hi-Fi, to address the skeleton detection problem. The proposed CNN-based approach has a powerful multi-scale feature integration ability that intrinsically captures high-level semantics from deeper layers as well as low-level details from shallower layers. By hierarchically integrating different CNN feature levels with bidirectional guidance, our approach (1) enables mutual refinement across features of different levels, and (2) possesses the strong ability to capture both rich object context and high-resolution details. Experimental results show that our method significantly outperforms the state-of-the-art methods in terms of effectively fusing features from very different scales, as evidenced by a considerable performance improvement on several benchmarks.
Architecture:
Performance Evaluation:
We test the proposed method on 4 skeleton datasets in terms of F-measure and pr-curve. The datasets are SK-SMALL, SK-LARGE SYM-PASCAL, and WH-SYMMAX.
Code, Data, and Pretrained Models:
The code will be available in https://github.com/zeakey/skeleton.
老师您好,请问文章中,关于平衡因子β这里的叙述是不是反了?
你说的是F-measure那个iccv论文里的吧,是不是留言地方错了?提问的时候最好具体一些。
不好意思,就是Hi-Fi这篇文章里的。
为了计算loss,平衡骨架/非骨架损失的β系数,按文章叙述是骨架损失值乘上骨架像素所占比例,好像是反了。
代码是对的。
感谢对我们的工作感兴趣。首先你提问最好指明具体位置(涉及paper指明章节号或者公式标号,涉及代码指明行号),这样沟通起来效率更高。
然后我回答你的问题:论文中 Eq.4 的 $\beta^m$ 没有问题,损失函数系数应当和该类别样本数成反比。举个例子,如果我们要从非常多的背景点中检测骨架点,如果正负样本不均衡,那么模型会陷入局部最优:总是将像素点预测成背景,这样总是能取得很小的损失。
所以我们要在损失上加系数:给稀有类别加很大的系数。这个很容易想明白,如果你给稀有类别(骨架点)加的系数等于其占比,这会让模型更容易陷入局部最优解。
我再举个更简单的例子,假设你回答判断题,100个题里面有2个答案是no,其他全是yes。如果回答错了打你一下。 你只需要闭着眼睛回答yes,受到的惩罚会很小。数据本身有bias,模型很容易陷入数据的bias里去。
现在规则改为:错将yes回答成no,打你一下;错将no回答成yes,打你10下。这种情况下模型将不容易陷入数据的bias里面去。
希望能够解答你的疑惑。
谢谢您细致的解释。
但实际上在《Hi-Fi: Hierarchical Feature Integration for Skeleton Detection》,3.3节,公式4中,在下方β被定义为非背景像素所占比例(少数),也就是说少数的骨架像素乘的β是更小的系数,而背景像素乘的系数(1-β)更大。
代码中确实没有问题,但文章中的叙述反了,如果我没理解错的话。。。