machine learning - neural - Comprensión intuitiva de las convoluciones 1D, 2D y 3D en redes neuronales convolucionales

convolutional neural network tutorial (2)

¿Alguien puede explicar claramente la diferencia entre convoluciones 1D, 2D y 3D en CNN (Deep Learning) con ejemplos?

CNN 1D, 2D o 3D se refiere a la dirección de convolución, en lugar de la dimensión de entrada o filtro.
Para la entrada de 1 canal, CNN2D es igual a CNN1D es la longitud del núcleo = longitud de entrada. (1 dirección de conv)

Quiero explicar con foto de C3D .

En pocas palabras, la dirección convolucional y la forma de salida son importantes.

↑↑↑↑↑ 1D Convoluciones - Básico ↑↑↑↑↑

solo 1 dirección (eje de tiempo) para calcular conv
entrada = [W], filtro = [k], salida = [W]
ex) entrada = [1,1,1,1,1], filtro = [0.25,0.5,0.25], salida = [1,1,1,1,1]
la forma de salida es una matriz 1D
ejemplo) suavizado de gráficos

Código tf.nn.conv1d Ejemplo de juguete

import tensorflow as tf import numpy as np sess = tf.Session() ones_1d = np.ones(5) weight_1d = np.ones(3) strides_1d = 1 in_1d = tf.constant(ones_1d, dtype=tf.float32) filter_1d = tf.constant(weight_1d, dtype=tf.float32) in_width = int(in_1d.shape[0]) filter_width = int(filter_1d.shape[0]) input_1d = tf.reshape(in_1d, [1, in_width, 1]) kernel_1d = tf.reshape(filter_1d, [filter_width, 1, 1]) output_1d = tf.squeeze(tf.nn.conv1d(input_1d, kernel_1d, strides_1d, padding=''SAME'')) print sess.run(output_1d)

↑↑↑↑↑ Convoluciones 2D - Básico ↑↑↑↑↑

Dirección 2 (x, y) para calcular conv
la forma de salida es Matriz 2D
entrada = [W, H], filtro = [k, k] salida = [W, H]
ejemplo) Sobel Egde Fllter

tf.nn.conv2d - Ejemplo de juguete

ones_2d = np.ones((5,5)) weight_2d = np.ones((3,3)) strides_2d = [1, 1, 1, 1] in_2d = tf.constant(ones_2d, dtype=tf.float32) filter_2d = tf.constant(weight_2d, dtype=tf.float32) in_width = int(in_2d.shape[0]) in_height = int(in_2d.shape[1]) filter_width = int(filter_2d.shape[0]) filter_height = int(filter_2d.shape[1]) input_2d = tf.reshape(in_2d, [1, in_height, in_width, 1]) kernel_2d = tf.reshape(filter_2d, [filter_height, filter_width, 1, 1]) output_2d = tf.squeeze(tf.nn.conv2d(input_2d, kernel_2d, strides=strides_2d, padding=''SAME'')) print sess.run(output_2d)

↑↑↑↑↑ Convoluciones 3D - Básico ↑↑↑↑↑

Dirección 3 (x, y, z) para calcular conv
la forma de salida es el volumen 3D
entrada = [W, H, L ], filtro = [k, k, d ] salida = [W, H, M]
d <L es importante! para hacer salida de volumen
ejemplo) C3D

tf.nn.conv3d - Ejemplo de juguete

ones_3d = np.ones((5,5,5)) weight_3d = np.ones((3,3,3)) strides_3d = [1, 1, 1, 1, 1] in_3d = tf.constant(ones_3d, dtype=tf.float32) filter_3d = tf.constant(weight_3d, dtype=tf.float32) in_width = int(in_3d.shape[0]) in_height = int(in_3d.shape[1]) in_depth = int(in_3d.shape[2]) filter_width = int(filter_3d.shape[0]) filter_height = int(filter_3d.shape[1]) filter_depth = int(filter_3d.shape[2]) input_3d = tf.reshape(in_3d, [1, in_depth, in_height, in_depth, 1]) kernel_3d = tf.reshape(filter_3d, [filter_depth, filter_height, filter_width, 1, 1]) output_3d = tf.squeeze(tf.nn.conv3d(input_3d, kernel_3d, strides=strides_3d, padding=''SAME'')) print sess.run(output_3d)

↑↑↑↑↑ Convoluciones 2D con entrada 3D - LeNet, VGG, ..., ↑↑↑↑↑

Aunque la entrada es 3D ex) 224x224x3, 112x112x32
la forma de salida no es Volumen 3D , sino Matriz 2D
porque la profundidad del filtro = L debe coincidir con los canales de entrada = L
Dirección 2 (x, y) para calcular conv. no 3D
entrada = [W, H, L ], filtro = [k, k, L ] salida = [W, H]
la forma de salida es Matriz 2D
¿Qué pasa si queremos entrenar N filtros (N es el número de filtros)
entonces la forma de salida es (apilada en 2D) 3D = 2D x N matriz.

conv2d - LeNet, VGG, ... para 1 filtro

in_channels = 32 # 3 for RGB, 32, 64, 128, ... ones_3d = np.ones((5,5,in_channels)) # input is 3d, in_channels = 32 # filter must have 3d-shpae with in_channels weight_3d = np.ones((3,3,in_channels)) strides_2d = [1, 1, 1, 1] in_3d = tf.constant(ones_3d, dtype=tf.float32) filter_3d = tf.constant(weight_3d, dtype=tf.float32) in_width = int(in_3d.shape[0]) in_height = int(in_3d.shape[1]) filter_width = int(filter_3d.shape[0]) filter_height = int(filter_3d.shape[1]) input_3d = tf.reshape(in_3d, [1, in_height, in_width, in_channels]) kernel_3d = tf.reshape(filter_3d, [filter_height, filter_width, in_channels, 1]) output_2d = tf.squeeze(tf.nn.conv2d(input_3d, kernel_3d, strides=strides_2d, padding=''SAME'')) print sess.run(output_2d)

conv2d - LeNet, VGG, ... para filtros N

in_channels = 32 # 3 for RGB, 32, 64, 128, ... out_channels = 64 # 128, 256, ... ones_3d = np.ones((5,5,in_channels)) # input is 3d, in_channels = 32 # filter must have 3d-shpae x number of filters = 4D weight_4d = np.ones((3,3,in_channels, out_channels)) strides_2d = [1, 1, 1, 1] in_3d = tf.constant(ones_3d, dtype=tf.float32) filter_4d = tf.constant(weight_4d, dtype=tf.float32) in_width = int(in_3d.shape[0]) in_height = int(in_3d.shape[1]) filter_width = int(filter_4d.shape[0]) filter_height = int(filter_4d.shape[1]) input_3d = tf.reshape(in_3d, [1, in_height, in_width, in_channels]) kernel_4d = tf.reshape(filter_4d, [filter_height, filter_width, in_channels, out_channels]) #output stacked shape is 3D = 2D x N matrix output_3d = tf.nn.conv2d(input_3d, kernel_4d, strides=strides_2d, padding=''SAME'') print sess.run(output_3d)

↑↑↑↑↑ Bonus 1x1 conv en CNN - GoogLeNet, ..., ↑↑↑↑↑

1x1 conv es confuso cuando piensas que esto es un filtro de imagen 2D como sobel
para 1x1 conv en CNN, la entrada tiene forma 3D como en la imagen de arriba.
calcula el filtrado en profundidad
entrada = [W, H, L], filtro = [1,1, L] salida = [W, H]
La forma apilada de salida es 3D = 2D x N matriz.

tf.nn.conv2d - caso especial 1x1 conv

in_channels = 32 # 3 for RGB, 32, 64, 128, ... out_channels = 64 # 128, 256, ... ones_3d = np.ones((1,1,in_channels)) # input is 3d, in_channels = 32 # filter must have 3d-shpae x number of filters = 4D weight_4d = np.ones((3,3,in_channels, out_channels)) strides_2d = [1, 1, 1, 1] in_3d = tf.constant(ones_3d, dtype=tf.float32) filter_4d = tf.constant(weight_4d, dtype=tf.float32) in_width = int(in_3d.shape[0]) in_height = int(in_3d.shape[1]) filter_width = int(filter_4d.shape[0]) filter_height = int(filter_4d.shape[1]) input_3d = tf.reshape(in_3d, [1, in_height, in_width, in_channels]) kernel_4d = tf.reshape(filter_4d, [filter_height, filter_width, in_channels, out_channels]) #output stacked shape is 3D = 2D x N matrix output_3d = tf.nn.conv2d(input_3d, kernel_4d, strides=strides_2d, padding=''SAME'') print sess.run(output_3d)

Animación (Conv 2D con entradas 3D)

- Enlace original: LINK
- El autor: Martin Görner
- Twitter: @martin_gorner
- Google +: plus.google.com/+MartinGorne

Convoluciones de bonificación 1D con entrada 2D

↑↑↑↑↑ 1D Convoluciones con entrada 1D ↑↑↑↑↑

↑↑↑↑↑ 1D Convoluciones con entrada 2D ↑↑↑↑↑

A pesar de que la entrada es 2D ex) 20x14
la forma de salida no es 2D , sino 1D Matrix
porque la altura del filtro = L debe coincidir con la altura de entrada = L
1 dirección (x) para calcular conv. no 2D
entrada = [W, L ], filtro = [k, L ] salida = [W]
la forma de salida es 1D Matrix
¿Qué pasa si queremos entrenar N filtros (N es el número de filtros)
entonces la forma de salida es (apilada 1D) 2D = 1D x N matriz.

Bonus C3D

in_channels = 32 # 3, 32, 64, 128, ... out_channels = 64 # 3, 32, 64, 128, ... ones_4d = np.ones((5,5,5,in_channels)) weight_5d = np.ones((3,3,3,in_channels,out_channels)) strides_3d = [1, 1, 1, 1, 1] in_4d = tf.constant(ones_4d, dtype=tf.float32) filter_5d = tf.constant(weight_5d, dtype=tf.float32) in_width = int(in_4d.shape[0]) in_height = int(in_4d.shape[1]) in_depth = int(in_4d.shape[2]) filter_width = int(filter_5d.shape[0]) filter_height = int(filter_5d.shape[1]) filter_depth = int(filter_5d.shape[2]) input_4d = tf.reshape(in_4d, [1, in_depth, in_height, in_depth, in_channels]) kernel_5d = tf.reshape(filter_5d, [filter_depth, filter_height, filter_width, in_channels, out_channels]) output_4d = tf.nn.conv3d(input_4d, kernel_5d, strides=strides_3d, padding=''SAME'') print sess.run(output_4d) sess.close()