论文：Practical Guidelines for Efficient CNN Architecture Design

第三方代码：

ECCV2018关于模型加速和压缩的文章

论文详解：

作者在摘要和介绍里提到目前网络结构是根据computation complexity（FLOPs）来设计的,但是FLOPs并不能完全衡量模型，比如在Fig1(c)(d)中，FLOPs相同的情况下，有着不同的speed，因此FLOPs作为computation complexity唯一的指标是不合理的，会带来次优设计。

ShuffleNet V2 ，

作者指出两个主要的原因：

ShuffleNet V2

第一点简单的说就是MAC 和degree of parallelism难以在FLOPs指标中体现。

Practical Guidelines for Ef'ficient Network Design：

研究的开始阶段，作者先分析了ShuffleNetV1 和MobileNet V2

ShuffleNet V2

接下来介绍4个实验：

ShuffleNet V2

均值不等式： ShuffleNet V2 ，可得：

ShuffleNet V2

当输入通道数和输出通道数相等时，在特定的FLOPs下，MAC达到最小值。

实验结果如下：

ShuffleNet V2

为了量化network fragmentation 对效率的影响，作者测试了一系列不同程度的fragmentation。

ShuffleNet V2

在ARM中，速度减少相对较小，因为fragmented structures 对强大并行计算能力的设备（GPU）不友好。

ShuffleNet V2

ShuffeNet V2: an Efficient Architecture:

作者先是回顾了ShuffleNetV1，分析如下：

ShuffleNet V2

接下来，作者谈到了ShuffleNetV2的结构，在每个单元的开头增加了channel split的操作，将输入通道分成c-c’和c’，c’在文章中采用c/2.文章中，原话是这么说的Following G3, one branch remains as identity. The other branch consists of three convolutions with the same input and output channels to satisfy G1.接着1*1 cconv不再是group-wise，这跟第二点对应，同时前面的channel split已经算是变相的group操作了。其次，channel shuffle的操作移到了concat后面，和前面第3点发现对应，最后将add换成了concat，和第四点对应。多个（c）结构连接在一起的话，channel split、concat和channel shuffle是可以合并在一起的，同样和第四点对应。For spatial down sampling, the unit is slightly modied and illustrated in Figure 3(d). The channel split operator is removed. Thus, the number of output channels is doubled.

ShuffleNet V2