“读万卷书,行万里路”,深度学习领域每时每刻都在萌生新的灵感和想法。要成为这方面的大牛,我想理论知识、代码功底都得多多锻炼。我们不仅仅要对某一个方向深入了解,更要对CV这个领域有一个全面的认识。所以,读paper肯定是不能少的啦,从ImageNet比赛,到目标检测、图像分割,都有许多许多优秀的论文。这篇博客整理出一些优秀深度学习论文,也是对自己学习过程的一些记录吧,不断地学习state-of-the-art论文中的最新思想,这样才能跟得上时代的步伐吧~

深度学习大爆发:ImageNet 挑战赛

  ImageNet 挑战赛属于深度学习最基础的任务:分类。从最早最早的LeNet,到后来的GoogleNet,再到现在的Shufflenet,涌现了一大批优秀的卷积神经网络框架。这些框架也被广泛用于目标检测等更复杂的深度学习任务中作为backbone,用来提取图像的特征。各种state-of-the-art的CNN框架,也是我们首要学习的知识。

  • (LeNet) Y. Lecun, L. Bottou, Y. Bengio and P. Haffner, “Gradient-based learning applied to document recognition,” in Proceedings of the IEEE, vol. 86, no. 11, pp. 2278-2324, Nov. 1998.[PDF] ,CNN的开山之作,也是手写体识别经典论文
  • (AlexNet) Krizhevsky, Alex, Ilya Sutskever, and Geoffrey E. Hinton. Imagenet classification with deep convolutional neural networks. Advances in neural information processing systems. 2012 [PDF]ILSVRC-2012冠军,CNN历史上的转折,也是深度学习第一次在图像识别的任务上超过了SVM等传统的机器学习方法
  • (VGG) Simonyan, Karen, and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556 (2014).[PDF] 使用了大量的重复卷积层,对后面的网络产生了重要影响
  • (GoogLeNet) Szegedy, Christian, et al. Going deeper with convolutions. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2015. [PDF] 提出Inception模块,第一次在CNN中使用并行结构,后来的ResNet等都借鉴了该思想,CNN不再是一条路走到底的网络结构了
  • (InceptionV2、InceptionV3) Szegedy C, Vanhoucke V, Ioffe S, et al. Rethinking the Inception Architecture for Computer Vision[J]. Computer Science, 2015:2818-2826.[PDF]由于BN(Batch Normalization)等提出,改进了原始GoogLeNet中的Inception模块
  • (ResNet) He, Kaiming, et al. Deep residual learning for image recognition. arXiv preprint arXiv:1512.03385 (2015).[PDF] 提出残差结构,解决了深度学习网络层数太深梯度消失等问题,ResNet当时的层数达到了101层。
  • (Xception) Chollet F. Xception: Deep Learning with Depthwise Separable Convolutions[J]. arXiv preprint arXiv:1610.02357, 2016.[PDF]
  • (DenseNet) Huang G, Liu Z, Weinberger K Q, et al. Densely Connected Convolutional Networks[J]. 2016. [PDF] 将shortcut思想发挥到极致
  • (SeNet) Squeeze-and-Excitation Networks. [PDF] 主打融合通道间的信息(channel-wise),并且只增加微量计算
  • (Shufflenet) Zhang X, Zhou X, Lin M, et al. Shufflenet: An extremely efficient convolutional neural network for mobile devices[J]. [PDF] 使用shuffle操作来代替1x1卷积,实现通道信息融合,大大减小了参数量,主要面向一些计算能力不足的移动设备。
  • (capsules) Sabour S, Frosst N, Hinton G E. Dynamic routing between capsules[C][PDF]

物体检测

  深度学习另外一个重要的任务就是物体检测,在1990年以前,典型的物体检测方法是基于 geometric representations,之后物体检测的方法像统计分类的方向发展(神经网络、SVM、Adaboost等)。
  2012年当深度神经网络(DCNN)在图像分类上取得了突破性进展时,这个巨大的成功也被用到了物体检测上。Girshick提出了里程碑式的物体检测模型Region based CNN(RCNN),在此之后物体检测领域飞速发展、并且提出了许多基于深度学习的方法,如YOLO、SSD等……

  • (R-CNN) Girshick, Ross, et al. Rich feature hierarchies for accurate object detection and semantic segmentation. Proceedings of the IEEE conference on computer vision and pattern recognition. 2014.[PDF] 里程碑式的物体检测框架,RCNN系列的开山鼻祖,后续深度学习的物体检测都借鉴了思想,不得不读的paper
  • (SPPNet) He K, Zhang X, Ren S, et al. Spatial pyramid pooling in deep convolutional networks for visual recognition[C]//European Conference on Computer Vision. Springer International Publishing, 2014: 346-361.[PDF] 主要改进了R-CNN中计算过慢重复提取特征的问题
  • (Fast R-CNN) Girshick R. Fast r-cnn[C]//Proceedings of the IEEE International Conference on Computer Vision. 2015: 1440-1448.[PDF] RCNN系列的第二版,提出RoI Pooling,同时改进了 R-CNN 和 SPPNet,同时提高了速度和精度
  • (Faster R-CNN) Ren S, He K, Girshick R, et al. Faster r-cnn: Towards real-time object detection with region proposal networks[C]//Advances in neural information processing systems. 2015: 91-99.[PDF] R-CNN系列巅峰,提出了anchor、RPN等方法,广泛被后续网络采用。
  • (YOLO) Redmon J, Divvala S, Girshick R, et al. You only look once: Unified, real-time object detection[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2016: 779-788.[PDF] One-Stage目标检测框架代表之一,速度非常快,不过精度不如R-CNN系列
  • (SSD) Liu W, Anguelov D, Erhan D, et al. SSD: Single shot multibox detector[C]//European Conference on Computer Vision. Springer International Publishing, 2016: 21-37.[PDF] One-Stage目标检测框架代表之二,提高速度的同时又不降低精度
  • (R-FCN) Li Y, He K, Sun J. R-fcn: Object detection via region-based fully convolutional networks[C]//Advances in Neural Information Processing Systems. 2016: 379-387.[PDF]
  • (DSSD) Fu, C., Liu, W., Ranga, A., Tyagi, A., & Berg, A.C. (2017). DSSD : Deconvolutional Single Shot Detector. CoRR, abs/1701.06659.[PDF] 和FPN的思想有类似,采用deconvolution,进行了特征融合,提高了SSD在小物体,重叠物体上的检测精度
  • (FPN) T. Lin, P. Dollár, R. Girshick, K. He, B. Hariharan and S. Belongie, “Feature Pyramid Networks for Object Detection,” 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Honolulu, HI, 2017 [PDF] 提出特征图金字塔,让卷积神经网络中深层中提取的语义信息融合到每一层的特征图中(特别是底层的高分辨率特征图也能获得高层的语义信息),提高特征图多尺度的表达,提高了一些小目标的识别精度,在图像分割和物体监测中都可用到。
  • (RetinaNet) Lin, T., Goyal, P., Girshick, R.B., He, K., & Dollár, P. (2017). Focal Loss for Dense Object Detection. 2017 IEEE International Conference on Computer Vision (ICCV), 2999-3007.[PDF] 提出了FocalLoss 解决物体检测中负类样本过多,类别不平衡的问题
  • (TDM) Shrivastava, Abhinav, Rahul Sukthankar, Jitendra Malik and Abhinav Gupta. “Beyond Skip Connections: Top-Down Modulation for Object Detection.” CoRR abs/1612.06851 (2016): n. pag.[PDF] 和FPN思想类似,不过文中提出的方法是一层一层的添加top-down模块
  • (YOLO-v2) Redmon, J., & Farhadi, A. (2017). YOLO9000: Better, Faster, Stronger. 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 6517-6525.[PDF] YOLO的改进版本
  • (SIN) Liu, Y., Wang, R., Shan, S., & Chen, X. (2018). Structure Inference Net: Object Detection Using Scene-Level Context and Instance-Level Relationships. CVPR.[PDF] 将RNN用于CV中,很新颖的网络结构
  • (STDN) Scale-Transferrable Object Detection Peng Zhou, Bingbing Ni, Cong Geng, Jianguo Hu, Yi Xu; The IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2018, pp. 528-537 [PDF] 主要是改进DSSD、FPN等结构中由于特征融合而引入额外参数,导致速度变慢问题。提出用DenseNet作为Backbone 从而在forward 时候进行特征融合,并提出了不带参数的scale-transform module ,保证精度的同时提高速度
  • (RefineDet) Shrivastava, Abhinav et al. “Training Region-Based Object Detectors with Online Hard Example Mining.” 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2016): 761-769.[PDF] 结合了one-stage 和 two-stage的优点
  • (MegDet) Peng, Chao, Tete Xiao, Zeming Li, Yuning Jiang, Xiangyu Zhang, Kai Jia, Gang Yu and Jian Sun. “MegDet: A Large Mini-Batch Object Detector.” CVPR (2018).[PDF] 关注于物体检测训练过程中batch size过小的问题,提出Cross-GPU Batch Normalization,训练时能达到256的batch size,coco2017数据集训练时间缩短到4小时。

深度学习一些tricks以及CNN网络结构的改善

  • (BatchNorm) Ioffe, Sergey and Christian Szegedy. “Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift.” ICML (2015).[PDF]训练过程中的大杀器,可以加速模型收敛,并且训练过程数值更加稳定
  • Tong He, Zhi Zhang, Hang Zhang, Zhongyue Zhang, Junyuan Xie, Mu Li:Bag of Tricks for Image Classification with Convolutional Neural Networks. CoRR abs/1812.01187 (2018) [PDF] 系统的介绍了许多训练CNN的trick

人脸检测

  • (SSH) Najibi, Mahyar et al. “SSH: Single Stage Headless Face Detector.” 2017 IEEE International Conference on Computer Vision (ICCV) (2017): 4885-4894. [PDF]
  • ($S^3$FD) Zhang, S., Zhu, X., Lei, Z., Shi, H., Wang, X., & Li, S.Z. (2017). S^3FD: Single Shot Scale-Invariant Face Detector. 2017 IEEE International Conference on Computer Vision (ICCV), 192-201. [PDF] 使用了锚框匹配策略和max-out增大了小尺寸人脸的召回率和假正例

图像分割

GAN生成对抗网络

强化学习

参考博客:https://blog.csdn.net/qq_21190081/article/details/69564634