DilatedDilatedDilated convolution/Atrousconvolution/Atrousconvolution/Atrous $
convolution$可以叫空洞卷积或者扩张卷积。

空洞卷积诞生于图像分割领域,图像输入到网络中经过CNNCNNCNN提取特征,再经过poolingpoolingpooling
降低图像尺度的同时增大感受野。由于图像分割是pixel−wisepixel-wisepixel−wise预测输出,所以还需要通过upsampling
upsamplingupsampling将变小的图像恢复到原始大小。upsamplingupsamplingupsampling通常是通过deconv
deconvdeconv(转置卷积)完成。因此图像分割FCNFCNFCN有两个关键步骤:池化操作增大感受野,upsamplingupsamplingupsamp
ling操作扩大图像尺寸。这儿有个问题,就是虽然图像经过upsamplingupsamplingupsampling
操作恢复了大小,但是很多细节还是被池化操作丢失了。那么有没有办法既增大了感受野又不减小图像大小呢?$Dilated $convconvconv横空出世。

在讲空洞卷积都会用到原论文中的一张图来说明


在空洞卷积中有个重要的参数叫rateraterate,这个参数代表了空洞的大小。
要理解空洞概念和如何操作可以从两个角度去看。

1)从原图角度,所谓空洞就是在原图上做采样。采样的频率是根据rate参数来设置的,当rate为1时候,就是原图不丢失任何信息采样,此时卷积操作就是标准的卷积操作,当rate>1,比如2的时候,就是在原图上每隔一(rate-1)个像素采样,如图b,可以把红色的点想象成在原图上的采样点,然后将采样后的图像与kernel做卷积,这样做其实变相增大了感受野。
2)从kernel角度去看空洞的话就是扩大kernel的尺寸,在kernel中,相邻点之间插入rate-1个零,然后将扩大的kernel和原图做卷积
,这样还是增大了感受野。

在VGG网络中就证明了使用小卷积核叠加来取代大卷积核可以起到减少参数同时达到大卷积核同样大小感受野的功效。但是通过叠加小卷积核来扩大感受野只能线性增长,公式为
(kernelSize−1)∗layers+1(kernelSize-1)*layers+1(kernelSize−1)∗layers+1
,,也就是线性增长,而空洞卷积可以以指数级增长感受野。

标准卷积方式


空洞卷积方式

空洞卷积在全卷积网络(FCN)(FCN)(FCN)中可以有效地控制计算特征图响应的密度,在密集预测的任务中,如语义分割/semanticsemanticse
mantic imageimageimage segmentation,segmentation,segmentation, opticaopticaoptic
al flowflowflow computationcomputationcomputation, ororor depthdepthdepth estima
tionestimationestimation
,当它和双线性插值一起使用时可以替代转置卷积。空洞卷积可以在kernel有效增大感受野的同时不增加模型参数或者计算量。在图像需要全局信息或者语音文本需要较长的sequence信息依赖的问题中,都能较好的应用空洞卷积。在图像分割,语音合成WaveNet,机器翻译ByteNet中都有空洞卷积的身影。

在之前的一篇博文 <https://blog.csdn.net/silence2015/article/details/78649734>
中,我稍微总结了deconv/转置卷积概念和用法,现在把deconv和Dilated conv在一起比较一下。

deconvdeconvdeconv主要用在增大图像尺寸,是upsamplingupsamplingupsampling
的一种。而空洞卷积并没有做upsampling,而是为了增大感受野,并且可以不改变图像大小(stridestridestride为1)。

对于标准的k*k的卷积核,stridestridestride为sss,分三种情况分析:

1)s>1s>1s>1,在卷积同时并伴随了downsamplingdownsamplingdownsampling操作,卷积后图像变小。
2)s=1s=1s=1,在paddingpaddingpadding为SAMESAMESAME时卷积后图像大小不变
3)s<1s<1s<1,fractionallyfractionallyfractionally stridedstridedstrided c
onvolutionconvolutionconvolution,相当于对原图先作了upsamplingupsamplingupsampling
操作扩大原图,然后再卷积,这样得到的结果图会变大。

DilatedDilatedDilated convconvconv是在原图上skipskipskip一些像素然后做卷积,或者是将kernelkernelke
rnel填充扩大后来卷积,以达到增大感受野的效果。

在TensorflowTensorflowTensorflow框架中可以通过两种方式实现空洞卷积,KaTeX parse error: Expected
group after '_' at position 13: tf.nn.atrous_̲_conv2d或者tf.nn.conv2dtf.nn.conv2dt
f.nn.conv2d。

KaTeX parse error: Expected group after '_' at position 13:
tf.nn.atrous_̲_conv2d有五个参数,value,filters,rate,padding,name
value,filters,rate,padding,namevalue,filters,rate,padding,name。其中rateraterate
就是代表对kernelkernelkernel做填充的程度,在kernelkernelkernel每个值之间填充rate−1rate-1rate−1
个零,这样得到的有效卷积核高为filterHeight+(filterHeight−1)∗(rate−1)宽为filterWidth+(filterWidth−
1)∗(rate−1)。filterHeight + (filterHeight - 1) * (rate - 1) 宽为filterWidth +
(filterWidth - 1) * (rate - 1)。filterHeight+(filterHeight−1)∗(rate−1)宽为filterWid
th+(filterWidth−1)∗(rate−1)。

在tf.nn.conv2d函数中有一个参数叫dilations,同样可以是可以实现空洞卷积的效果
在tf.nn.conv2d函数中有一个参数叫dilations,同样可以是可以实现空洞卷积的效果在tf.nn.conv2d函数中有一个参数叫dilations,
同样可以是可以实现空洞卷积的效果

2018/4/2更新:


在实际使用中发现atrous_conv2d和conv2d对于空洞后卷积输出的shape描述不清楚,自己搜资料发现输出的shape不光和padding有关,还与rate有关。输出shape计算思路如下,首先看padding,如果padding是SAME,那么不管rate是多少,都按照
这个来算。如果padding是VALID,那么也是按照
这个来算,只不过这儿的filter_size需要根据rate来重新算,也就是说空洞是加在卷积核上的,我们先对卷积核填充0,得到新的卷积核大小filter_height
= heght+(height-1)*(rate-1),宽同理。将新的filter送到上面VALID模式下计算卷积输出就是最后的输出了。

2019.2.22更
上面提到的输出特征图大小的计算方式略有欠缺,标准的计算方式是要加上padding的大小的,也就是说
out=floor((in−filterSize+2∗padding)/stride)+1
out=floor((in-filterSize+2*padding)/stride)+1out=floor((in−filterSize+2∗padding)
/stride)+1
其中在tensorflow框架下,用的ceil(向上取整),mxnet用的floor(向下取整),计算时一般按照floor来算
实际代码输出感受下
import tensorflow as tf import numpy as np input_img_np = np.random.random((1,
256, 256, 1)).astype(np.float32) kernel =
np.random.random((6,6,1,1)).astype(np.float32) with tf.Session() as sess:
concrete_input_op = tf.constant(input_img_np) concrete_output_op =
tf.nn.convolution(concrete_input_op, kernel, padding='SAME',
dilation_rate=np.array([2, 2])) concrete_output = sess.run(concrete_output_op)
print('convolution + CONCRETE + SAME') print('concrete_input_op: ',
concrete_input_op.get_shape()) print('concrete_output_op: ',
concrete_output_op.get_shape()) print('concrete_output:',
concrete_output.shape) assert(concrete_input_op.get_shape() ==
concrete_output_op.get_shape()) undef_input_op = tf.placeholder(tf.float32,
shape=(None, 256, 256, 1)) undef_output_op = tf.nn.convolution(undef_input_op,
kernel, padding='SAME', dilation_rate=np.array([2, 2])) undef_output =
sess.run(undef_output_op, feed_dict={undef_input_op: input_img_np})
print('convolution + UNDEF + SAME') print('undef_input_op: ',
undef_input_op.get_shape()) print('undef_output_op: ',
undef_output_op.get_shape()) print('undef_output:', undef_output.shape) # This
assert will correctly fail even though the shapes are ok because shapes are
only partially known # assert(undef_input_op.get_shape() ==
undef_output_op.get_shape()) valid_concrete_input_op =
tf.constant(input_img_np) valid_concrete_output_op =
tf.nn.convolution(valid_concrete_input_op, kernel, padding='VALID',
dilation_rate=np.array([2, 2])) valid_concrete_output =
sess.run(valid_concrete_output_op) print('convolution + CONCRETE + VALID')
print('valid_concrete_input_op: ', valid_concrete_input_op.get_shape())
print('valid_concrete_output_op: ', valid_concrete_output_op.get_shape())
print('valid_concrete_output:', valid_concrete_output.shape)
valid_undef_input_op = tf.placeholder(tf.float32, shape=(None, 256, 256, 1))
valid_undef_output_op = tf.nn.convolution(valid_undef_input_op, kernel,
padding='VALID', dilation_rate=np.array([2, 2])) valid_undef_output =
sess.run(valid_undef_output_op, feed_dict={valid_undef_input_op: input_img_np})
print('convolution + UNDEF + VALID') print('valid_undef_input_op: ',
valid_undef_input_op.get_shape()) print('valid_undef_output_op: ',
valid_undef_output_op.get_shape()) print('valid_undef_output:',
valid_undef_output.shape) # This assert will correctly fail even though the
shapes are ok because shapes are only partially known #
assert(undef_input_op.get_shape() == undef_output_op.get_shape())
############################################################################ #
Now atrous concrete_input_op = tf.constant(input_img_np) concrete_output_op =
tf.nn.atrous_conv2d(concrete_input_op, kernel, padding='SAME', rate=2)
concrete_output = sess.run(concrete_output_op) print('atrous_conv2d + CONCRETE
+ SAME') print('concrete_input_op: ', concrete_input_op.get_shape())
print('concrete_output_op: ', concrete_output_op.get_shape())
print('concrete_output_op: ', concrete_output_op.get_shape())
print('concrete_output:', concrete_output.shape)
assert(concrete_input_op.get_shape() == concrete_output_op.get_shape())
undef_input_op = tf.placeholder(tf.float32, shape=(None, 256, 256, 1))
undef_output_op = tf.nn.atrous_conv2d(undef_input_op, kernel, padding='SAME',
rate=2) undef_output = sess.run(undef_output_op, feed_dict={undef_input_op:
input_img_np}) print('atrous_conv2d + UNDEF + SAME') print('undef_input_op: ',
undef_input_op.get_shape()) print('undef_output_op: ',
undef_output_op.get_shape()) print('undef_output:', undef_output.shape) # This
assert will correctly fail even though the shapes are ok because shapes are
only partially known # assert(undef_input_op.get_shape() ==
undef_output_op.get_shape()) valid_concrete_input_op =
tf.constant(input_img_np) valid_concrete_output_op =
tf.nn.atrous_conv2d(valid_concrete_input_op, kernel, padding='VALID', rate=2)
valid_concrete_output = sess.run(valid_concrete_output_op) print('atrous_conv2d
+ CONCRETE + VALID') print('valid_concrete_input_op: ',
valid_concrete_input_op.get_shape()) print('valid_concrete_output_op: ',
valid_concrete_output_op.get_shape()) print('valid_concrete_output:',
valid_concrete_output.shape) valid_undef_input_op = tf.placeholder(tf.float32,
shape=(None, 256, 256, 1)) valid_undef_output_op =
tf.nn.atrous_conv2d(valid_undef_input_op, kernel, padding='VALID', rate=2)
valid_undef_output = sess.run(valid_undef_output_op,
feed_dict={valid_undef_input_op: input_img_np}) print('atrous_conv2d + UNDEF +
VALID') print('valid_undef_input_op: ', valid_undef_input_op.get_shape())
print('valid_undef_output_op: ', valid_undef_output_op.get_shape())
print('valid_undef_output:', valid_undef_output.shape) # This assert will
correctly fail even though the shapes are ok because shapes are only partially
known # assert(undef_input_op.get_shape() == undef_output_op.get_shape())
convolution + CONCRETE + SAME ('concrete_input_op: ',
TensorShape([Dimension(1), Dimension(256), Dimension(256), Dimension(1)]))
('concrete_output_op: ', TensorShape([Dimension(1), Dimension(256),
Dimension(256), Dimension(1)])) ('concrete_output:', (1, 256, 256, 1))
convolution + UNDEF + SAME ('undef_input_op: ', TensorShape([Dimension(None),
Dimension(256), Dimension(256), Dimension(1)])) ('undef_output_op: ',
TensorShape([Dimension(None), Dimension(256), Dimension(256), Dimension(1)]))
('undef_output:', (1, 256, 256, 1)) convolution + CONCRETE + VALID
('valid_concrete_input_op: ', TensorShape([Dimension(1), Dimension(256),
Dimension(256), Dimension(1)])) ('valid_concrete_output_op: ',
TensorShape([Dimension(1), Dimension(246), Dimension(246), Dimension(1)]))
('valid_concrete_output:', (1, 246, 246, 1)) convolution + UNDEF + VALID
('valid_undef_input_op: ', TensorShape([Dimension(None), Dimension(256),
Dimension(256), Dimension(1)])) ('valid_undef_output_op: ',
TensorShape([Dimension(None), Dimension(246), Dimension(246), Dimension(1)]))
('valid_undef_output:', (1, 246, 246, 1)) atrous_conv2d + CONCRETE + SAME
('concrete_input_op: ', TensorShape([Dimension(1), Dimension(256),
Dimension(256), Dimension(1)])) ('concrete_output_op: ',
TensorShape([Dimension(1), Dimension(256), Dimension(256), Dimension(1)]))
('concrete_output_op: ', TensorShape([Dimension(1), Dimension(256),
Dimension(256), Dimension(1)])) ('concrete_output:', (1, 256, 256, 1))
atrous_conv2d + UNDEF + SAME ('undef_input_op: ', TensorShape([Dimension(None),
Dimension(256), Dimension(256), Dimension(1)])) ('undef_output_op: ',
TensorShape([Dimension(None), Dimension(None), Dimension(None), Dimension(1)]))
('undef_output:', (1, 256, 256, 1)) atrous_conv2d + CONCRETE + VALID
('valid_concrete_input_op: ', TensorShape([Dimension(1), Dimension(256),
Dimension(256), Dimension(1)])) ('valid_concrete_output_op: ',
TensorShape([Dimension(1), Dimension(246), Dimension(246), Dimension(1)]))
('valid_concrete_output:', (1, 246, 246, 1)) atrous_conv2d + UNDEF + VALID
('valid_undef_input_op: ', TensorShape([Dimension(None), Dimension(256),
Dimension(256), Dimension(1)])) ('valid_undef_output_op: ',
TensorShape([Dimension(None), Dimension(None), Dimension(None), Dimension(1)]))
('valid_undef_output:', (1, 246, 246, 1))
参考文献:

Tensorflow官方文档
知乎:如何理解空洞卷积(dilated convolution)?
<https://www.zhihu.com/question/54149221/answer/192025860>
https://github.com/tensorflow/tensorflow/issues/4742
<https://github.com/tensorflow/tensorflow/issues/4742>