深度学习论文汇总

论文, 深度学习 2019-01-18

深度学习论文汇总

2019-01-18

“读万卷书，行万里路”，深度学习领域每时每刻都在萌生新的灵感和想法。要成为这方面的大牛，我想理论知识、代码功底都得多多锻炼。我们不仅仅要对某一个方向深入了解，更要对CV这个领域有一个全面的认识。所以，读paper肯定是不能少的啦，从ImageNet比赛，到目标检测、图像分割，都有许多许多优秀的论文。这篇博客整理出一些优秀深度学习论文，也是对自己学习过程的一些记录吧，不断地学习state-of-the-art论文中的最新思想，这样才能跟得上时代的步伐吧~

深度学习大爆发：ImageNet 挑战赛
物体检测
深度学习一些tricks以及CNN网络结构的改善
人脸检测
图像分割
GAN生成对抗网络
强化学习

深度学习大爆发：ImageNet 挑战赛

ImageNet 挑战赛属于深度学习最基础的任务：分类。从最早最早的LeNet，到后来的GoogleNet，再到现在的Shufflenet，涌现了一大批优秀的卷积神经网络框架。这些框架也被广泛用于目标检测等更复杂的深度学习任务中作为backbone，用来提取图像的特征。各种state-of-the-art的CNN框架，也是我们首要学习的知识。

(LeNet) Y. Lecun, L. Bottou, Y. Bengio and P. Haffner, “Gradient-based learning applied to document recognition,” in Proceedings of the IEEE, vol. 86, no. 11, pp. 2278-2324, Nov. 1998.[PDF] ，CNN的开山之作，也是手写体识别经典论文
(AlexNet) Krizhevsky, Alex, Ilya Sutskever, and Geoffrey E. Hinton. Imagenet classification with deep convolutional neural networks. Advances in neural information processing systems. 2012 [PDF]ILSVRC-2012冠军，CNN历史上的转折，也是深度学习第一次在图像识别的任务上超过了SVM等传统的机器学习方法
(VGG) Simonyan, Karen, and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556 (2014).[PDF] 使用了大量的重复卷积层，对后面的网络产生了重要影响
(GoogLeNet) Szegedy, Christian, et al. Going deeper with convolutions. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2015. [PDF] 提出Inception模块，第一次在CNN中使用并行结构，后来的ResNet等都借鉴了该思想，CNN不再是一条路走到底的网络结构了
(InceptionV2、InceptionV3) Szegedy C, Vanhoucke V, Ioffe S, et al. Rethinking the Inception Architecture for Computer Vision[J]. Computer Science, 2015:2818-2826.[PDF]由于BN（Batch Normalization）等提出，改进了原始GoogLeNet中的Inception模块
(ResNet) He, Kaiming, et al. Deep residual learning for image recognition. arXiv preprint arXiv:1512.03385 (2015).[PDF] 提出残差结构，解决了深度学习网络层数太深梯度消失等问题，ResNet当时的层数达到了101层。
(Xception) Chollet F. Xception: Deep Learning with Depthwise Separable Convolutions[J]. arXiv preprint arXiv:1610.02357, 2016.[PDF]
(DenseNet) Huang G, Liu Z, Weinberger K Q, et al. Densely Connected Convolutional Networks[J]. 2016. [PDF] 将shortcut思想发挥到极致
(SeNet) Squeeze-and-Excitation Networks. [PDF] 主打融合通道间的信息(channel-wise)，并且只增加微量计算
(Shufflenet) Zhang X, Zhou X, Lin M, et al. Shufflenet: An extremely efficient convolutional neural network for mobile devices[J]. [PDF] 使用shuffle操作来代替1x1卷积，实现通道信息融合，大大减小了参数量，主要面向一些计算能力不足的移动设备。
(capsules) Sabour S, Frosst N, Hinton G E. Dynamic routing between capsules[C][PDF]—

物体检测

深度学习另外一个重要的任务就是物体检测，在1990年以前，典型的物体检测方法是基于 geometric representations，之后物体检测的方法像统计分类的方向发展(神经网络、SVM、Adaboost等）。
2012年当深度神经网络(DCNN)在图像分类上取得了突破性进展时，这个巨大的成功也被用到了物体检测上。Girshick提出了里程碑式的物体检测模型Region based CNN(RCNN)，在此之后物体检测领域飞速发展、并且提出了许多基于深度学习的方法，如YOLO、SSD等……

(R-CNN) Girshick, Ross, et al. Rich feature hierarchies for accurate object detection and semantic segmentation. Proceedings of the IEEE conference on computer vision and pattern recognition. 2014.[PDF] 里程碑式的物体检测框架，RCNN系列的开山鼻祖，后续深度学习的物体检测都借鉴了思想，不得不读的paper
(SPPNet) He K, Zhang X, Ren S, et al. Spatial pyramid pooling in deep convolutional networks for visual recognition[C]//European Conference on Computer Vision. Springer International Publishing, 2014: 346-361.[PDF] 主要改进了R-CNN中计算过慢重复提取特征的问题
(Fast R-CNN) Girshick R. Fast r-cnn[C]//Proceedings of the IEEE International Conference on Computer Vision. 2015: 1440-1448.[PDF] RCNN系列的第二版，提出RoI Pooling，同时改进了 R-CNN 和 SPPNet，同时提高了速度和精度
(Faster R-CNN) Ren S, He K, Girshick R, et al. Faster r-cnn: Towards real-time object detection with region proposal networks[C]//Advances in neural information processing systems. 2015: 91-99.[PDF] R-CNN系列巅峰，提出了anchor、RPN等方法，广泛被后续网络采用。
(YOLO) Redmon J, Divvala S, Girshick R, et al. You only look once: Unified, real-time object detection[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2016: 779-788.[PDF] One-Stage目标检测框架代表之一，速度非常快，不过精度不如R-CNN系列
(SSD) Liu W, Anguelov D, Erhan D, et al. SSD: Single shot multibox detector[C]//European Conference on Computer Vision. Springer International Publishing, 2016: 21-37.[PDF] One-Stage目标检测框架代表之二，提高速度的同时又不降低精度
(R-FCN) Li Y, He K, Sun J. R-fcn: Object detection via region-based fully convolutional networks[C]//Advances in Neural Information Processing Systems. 2016: 379-387.[PDF]
(DSSD) Fu, C., Liu, W., Ranga, A., Tyagi, A., & Berg, A.C. (2017). DSSD : Deconvolutional Single Shot Detector. CoRR, abs/1701.06659.[PDF] 和FPN的思想有类似，采用deconvolution，进行了特征融合，提高了SSD在小物体，重叠物体上的检测精度
(FPN) T. Lin, P. Dollár, R. Girshick, K. He, B. Hariharan and S. Belongie, “Feature Pyramid Networks for Object Detection,” 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Honolulu, HI, 2017 [PDF] 提出特征图金字塔，让卷积神经网络中深层中提取的语义信息融合到每一层的特征图中（特别是底层的高分辨率特征图也能获得高层的语义信息），提高特征图多尺度的表达，提高了一些小目标的识别精度，在图像分割和物体监测中都可用到。
(RetinaNet) Lin, T., Goyal, P., Girshick, R.B., He, K., & Dollár, P. (2017). Focal Loss for Dense Object Detection. 2017 IEEE International Conference on Computer Vision (ICCV), 2999-3007.[PDF] 提出了FocalLoss 解决物体检测中负类样本过多，类别不平衡的问题
(TDM) Shrivastava, Abhinav, Rahul Sukthankar, Jitendra Malik and Abhinav Gupta. “Beyond Skip Connections: Top-Down Modulation for Object Detection.” CoRR abs/1612.06851 (2016): n. pag.[PDF] 和FPN思想类似，不过文中提出的方法是一层一层的添加top-down模块
(YOLO-v2) Redmon, J., & Farhadi, A. (2017). YOLO9000: Better, Faster, Stronger. 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 6517-6525.[PDF] YOLO的改进版本
(SIN) Liu, Y., Wang, R., Shan, S., & Chen, X. (2018). Structure Inference Net: Object Detection Using Scene-Level Context and Instance-Level Relationships. CVPR.[PDF] 将RNN用于CV中，很新颖的网络结构
(STDN) Scale-Transferrable Object Detection Peng Zhou, Bingbing Ni, Cong Geng, Jianguo Hu, Yi Xu; The IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2018, pp. 528-537 [PDF] 主要是改进DSSD、FPN等结构中由于特征融合而引入额外参数，导致速度变慢问题。提出用DenseNet作为Backbone 从而在forward 时候进行特征融合，并提出了不带参数的scale-transform module ，保证精度的同时提高速度
(RefineDet) Shrivastava, Abhinav et al. “Training Region-Based Object Detectors with Online Hard Example Mining.” 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2016): 761-769.[PDF] 结合了one-stage 和 two-stage的优点
(MegDet) Peng, Chao, Tete Xiao, Zeming Li, Yuning Jiang, Xiangyu Zhang, Kai Jia, Gang Yu and Jian Sun. “MegDet: A Large Mini-Batch Object Detector.” CVPR (2018).[PDF] 关注于物体检测训练过程中batch size过小的问题，提出Cross-GPU Batch Normalization,训练时能达到256的batch size，coco2017数据集训练时间缩短到4小时。

深度学习一些tricks以及CNN网络结构的改善

(BatchNorm) Ioffe, Sergey and Christian Szegedy. “Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift.” ICML (2015).[PDF]训练过程中的大杀器，可以加速模型收敛，并且训练过程数值更加稳定
Tong He, Zhi Zhang, Hang Zhang, Zhongyue Zhang, Junyuan Xie, Mu Li:Bag of Tricks for Image Classification with Convolutional Neural Networks. CoRR abs/1812.01187 (2018) [PDF] 系统的介绍了许多训练CNN的trick

人脸检测

(SSH) Najibi, Mahyar et al. “SSH: Single Stage Headless Face Detector.” 2017 IEEE International Conference on Computer Vision (ICCV) (2017): 4885-4894. [PDF]
($S^3$FD) Zhang, S., Zhu, X., Lei, Z., Shi, H., Wang, X., & Li, S.Z. (2017). S^3FD: Single Shot Scale-Invariant Face Detector. 2017 IEEE International Conference on Computer Vision (ICCV), 192-201. [PDF] 使用了锚框匹配策略和max-out增大了小尺寸人脸的召回率和假正例

图像分割

GAN生成对抗网络

强化学习

参考博客：https://blog.csdn.net/qq_21190081/article/details/69564634