<>SENet & SKNet

SENet（Squeeze-and-Excitation
Networks）是2017ImageNet分类比赛冠军模型，论文发表于2018年CVPR，目前是CVPR2018引用量最多的论文。SKNet(Selective
Kernel Networks)是2019CVPR的一篇文章。二者有一定共通之处，本文将这两个模型放在一起对比分析。
Paper : Sequeeze-and-excitation networks <https://arxiv.org/abs/1709.01507>，
Selective Kernel Networks <https://arxiv.org/abs/1903.06586?context=cs>
Github: Sequeeze-and-excitation networks
<https://github.com/hujie-frank/SENet>，Selective Kernel Networks
<https://github.com/implus/SKNet>

<>1.引入

<>2.SE-Net结构

SENet的作者想通过该模块实现两个功能

(1) 显式地建立模型，定义通道间关系
(2) 实现“特征重标定”。即对于不同channel-wise，加强有用信息并压缩无用信息

<>2.1 Sequeeze：Global Information Embedding

In order to tackle the issue of exploiting channel dependencies, we first
consider the signal to each channel in the output features. Each of the learned
filters operates with a local receptive field and consequently each unit of the
transformation output U is unable to exploit contextual information outside of
this region.

W维度的信息压缩为一个数。我们可以很容易的想到一个办法——全局平均池化。（对应结构图中 Fsq(·) 操作）

Formally, a statistic z ∈ RC is generated by shrinking U through its spatial
dimensions H ×W ,such that the c -th element of z is calculated by:

To make use of the information aggregated in the squeeze operation, we follow
it with a second operation which aims to fully capture channel-wise
dependencies.

To meet these criteria, we opt to employ a simple gating mechanism with a
sigmoid activation:
where δ refers to the ReLU function, W1∈ RC×\times×Cr\frac{C}{r}rC​and W2 ∈ RC
×\times×Cr\frac{C}{r}rC​

(1) 增加更多的非线性，可以更好地拟合通道间复杂的相关性
(2) 尽可能减少参数量和计算量

<>2.3 Scale

The final output of the block is obtained by rescaling the transformation
output U with the activations:

where X~\widetilde{X}X = [ x~1\tilde{x}_1x~1​ , x~2\tilde{x}_2x~2​ ,…, x~C
\tilde{x}_Cx~C​ ] and Fscale (uc ,sc ) refers to channel-wise multiplication
between the scalar sc and the feature map uc ∈ RH×W.

<>3.SE模块嵌入实例

<>4.SENet Experiments

<>5.Selective Kernel Networks

SKNet同样是一个轻量级嵌入式的模块，其灵感来源是，我们在看不同尺寸不同远近的物体时，视觉皮层神经元接受域大小是会根据刺激来进行调节的。那么对应于CNN网络，一般来说对于特定任务特定模型，卷积核大小是确定的，那么是否可以构建一种模型，使网络可以根据输入信息的多个尺度自适应的调节接受域大小呢？

<>Split:

Split: For any given feature map X ∈ RH’×W’×C’, by default we first conduct
two transformationsF~\widetilde{F}F: X → U~\widetilde{U}U∈RH×W×C and
F^\widehat{F}F: X → U^\widehat{U}U∈RH×W×C with kernel sizes 3 and 5,
respectively. Note that bothF~\widetilde{F}F and F^\widehat{F}F are composed of
efficient grouped/depthwise convolutions, Batch Normalization and ReLU function
in sequence.For further efficiency, the conventional convolution with a 5×5
kernel is replaced with the dilated convolution with a 3×3 kernel and dilation
size 2.

convolutions，Batch Normalization，ReLU
function)。如结构图所示，对X进行Kernel3×3和Kernel5×5的卷积操作，得到输出U~\widetilde{U}U 和U^
\widehat{U}U。

<>Fuse

<>Select

Select操作对应于SE模块中的Scale。区别是Select使用a和b两个权重矩阵对U~\widetilde{U}U 和U^\widehat{U}U

<>6.SKNet结构实例与实验结果

<>7.总结

SENet和SKNet都是可直接嵌入网络的轻量级模块，虽然SKNet的实验结果表明其略好于SENet，但SKNet使用时涉及到了卷积核数量和大小的选择问题，其作者也只是给出了不同参数选择下的实验结果，并选择了一个最好的结果与其他模型相比较。但直观来说SKNet相当于给网络融入了soft
attention机制，使网络可以获取不同感受野的信息，这或许可以成为一种泛化能力更好的网络结构。毕竟虽然Inception网络结构精妙，效果也不错，但总觉得有种人工设计特征痕迹过重的感觉。如果有网络可以自适应的调整结构，以获取不同感受野信息，那么或许可以实现CNN模型极限的下一次突破。

SE模块的更多介绍可以看这篇文章 → http://www.sohu.com/a/161633191_465975
<http://www.sohu.com/a/161633191_465975>