计算机视觉中常见问题及论文中的解决方案[不断更新,欢迎补充]

p.p1 { margin: 0.0px 0.0px 0.0px 0.0px; font: 13.0px Arial; color: #323333; min-height: 15.0px } p.p2 { margin: 0.0px 0.0px 0.0px 0.0px; font: 13.0px Arial; color: #323333 } p.p3 { margin: 0.0px 0.0px 0.0px 0.0px; font: 15.0px Arial; color: #323333 } li.li2 { margin: 0.0px 0.0px 0.0px 0.0px; font: 13.0px Arial; color: #323333 } span.s1 { } span.s2 { background-color: #fefa00 } span.s3 { font: 13.0px Arial } ol.ol1 { list-style-type: decimal } ol.ol2 { list-style-type: lower-latin } ol.ol3 { list-style-type: lower-roman } The order of solusions of each issue follows the timeline. You can see there is a trend from explicit to implicit, from hand-designed to automatical-learned, which can be followed to design your own innovative approaches! Occlusion
  • Hirerarchical part model for visibility estimation [1][5]
  • Occlusion data or feature augmentation [2] see Data and Feature Augmentation
  • Scale[14]
  • Image pyramid: compute feature from each level of the image pyramid [1]
    1. computational expensive, usually applied during the inference stage
  • Encoder-decoder: feature map from multiple convolutional and deconvolutional layers 
    1. pyramidal feature hierarchy [5] [6] [7] 
    2. feature pyramid [11]
  • Deeper with atrous convolution
  • Spatial pyramid pooling[14]
    1. 2,3,4 can be combined, and are explored in [15]
  • Attemtion
    1. Detect and focus on a smaller region in each stage [2]
    Data Imbalance 
  • Emphasize on balanced compilations of datasets in the first place
    1. Collecting their samples approximately uniformly
    2. Data and Feature Augmentation
      1. Dropout 1/2 neurals for better generalization [4]
      2. GAN
        1. Generating hard feature maps for occlusion and deformation in object detection task [2]
  • Conducting over-sampling of minority classes or under-sampling from the majority classes
    1. Weakness
      1. change the underlying data distributions and may result in suboptimal exploitation of available data
      2. increased computational effort and/or risk of over-fitting when repeatedly visiting the same samples
        1. SMOTE and derived variants on ways to avoid over-fitting
  • Data Mining for Hard Examples
    1. Online hard example mining (OHEM) [3] for both intra-class data imbalance and positive-negative imbalance 
  • Cost-sensitive learning
    1. Focal loss [8]: greater loss for harder example
    2. Loss Max-Pooling for Semantic Image Segmentation [16]:by the maximization with respect to pixel weighting functions, the loss function providing an adaptive re-weighting of the contributions of each pixel. Pixels incurring higher losses during training are weighted more than pixels with a lower loss. 
    Local & global information combination
  • Deep learning can learn some multi-scale information automatically[9]
  • Top-down semantic from FPN (focus on each scale) [11]
  • Multi-scale combination & selection from GBD-Net (focus on combination of scales) [12]
  • Utilization of Context
  • In traidition machine learning, mainly used as  refine object scores [1] [10]
  • RNN [13]
  • Utilization of Object Part Information
  • DPM [1]
  • Deep learning with DPM [5]
  • Position sensitive ROI pooling [10]: construct a score map from different channels (results of part detectors) of feature map
  • Metric learningMainly used in recognition.Computation Efficiency
  • eliminate redudant layers 
  • spatial adaptive computation
  • Reference:
  • Felzenszwalb, Pedro F., et al. "Object detection with discriminatively trained part-based models." IEEE transactions on pattern analysis and machine intelligence 32.9 (2010): 1627-1645.
  • Wang, Xiaolong, Abhinav Shrivastava, and Abhinav Gupta. "A-fast-rcnn: Hard positive generation via adversary for object detection." arXiv preprint arXiv:1704.03414 (2017).
  • Shrivastava, Abhinav, Abhinav Gupta, and Ross Girshick. "Training region-based object detectors with online hard example mining." Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2016.
  • Srivastava, Nitish, et al. "Dropout: a simple way to prevent neural networks from overfitting." Journal of Machine Learning Research 15.1 (2014): 1929-1958.
  • Ouyang, Wanli, and Xiaogang Wang. "Joint deep learning for pedestrian detection." Proceedings of the IEEE International Conference on Computer Vision. 2013.
  • Liu, Wei, et al. "Ssd: Single shot multibox detector." European conference on computer vision. Springer, Cham, 2016.
  • Fu, Cheng-Yang, et al. "DSSD: Deconvolutional Single Shot Detector." arXiv preprint arXiv:1701.06659 (2017).
  • Tsung-Yi Lin, etal. "Focal Loss for Dense Object Detection.” IEEE International Conference on Computer Vision (ICCV), 2017
  • Girshick, Ross. "Fast r-cnn." Proceedings of the IEEE international conference on computer vision. 2015.
  • C. Galleguillos and S. Belongie. Context based object cate- gorization: A critical survey. In CVPR, 2010. 1, 2
  • Lin T Y, Dollár P, Girshick R, et al. Feature pyramid networks for object detection[J]. arXiv preprint arXiv:1612.03144, 2016.
  • Zeng X, Ouyang W, Yang B, et al. Gated Bi-directional CNN for Object Detection[C]// European Conference on Computer Vision. Springer, Cham, 2016:354-369.
  • Bell S, Lawrence Zitnick C, Bala K, et al. Inside-outside net: Detecting objects in context with skip pooling and recurrent neural networks[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2016: 2874-2883.
  • Rethinking Atrous Convolution for Semantic Image Segmentation (DeepLab v3)
  • Encoder-Decoder with Atrous Separable Convolution for Semantic Image Segmentation (DeepLab v3+)
  • Loss Max-Pooling for Semantic Image Segmentation [16]
  • 相关内容推荐