time series - example - ¿Se restablece el estado inicial de RNN para los mini lotes posteriores?

lstm tensorflow example (2)

Además de la respuesta de danijar, aquí está el código para un LSTM, cuyo estado es una tupla ( state_is_tuple=True ). También soporta múltiples capas.

Definimos dos funciones: una para obtener las variables de estado con un estado cero inicial y otra para devolver una operación, que podemos pasar a session.run para actualizar las variables de estado con el último estado oculto de la LSTM.

def get_state_variables(batch_size, cell): # For each layer, get the initial state and make a variable out of it # to enable updating its value. state_variables = [] for state_c, state_h in cell.zero_state(batch_size, tf.float32): state_variables.append(tf.contrib.rnn.LSTMStateTuple( tf.Variable(state_c, trainable=False), tf.Variable(state_h, trainable=False))) # Return as a tuple, so that it can be fed to dynamic_rnn as an initial state return tuple(state_variables) def get_state_update_op(state_variables, new_states): # Add an operation to update the train states with the last state tensors update_ops = [] for state_variable, new_state in zip(state_variables, new_states): # Assign the new state to the state variables on this layer update_ops.extend([state_variable[0].assign(new_state[0]), state_variable[1].assign(new_state[1])]) # Return a tuple in order to combine all update_ops into a single operation. # The tuple''s actual value should not be used. return tf.tuple(update_ops)

Similar a la respuesta de danijar, podemos usar eso para actualizar el estado de LSTM después de cada lote:

data = tf.placeholder(tf.float32, (batch_size, max_length, frame_size)) cells = [tf.contrib.rnn.GRUCell(256) for _ in range(num_layers)] cell = tf.contrib.rnn.MultiRNNCell(cells) # For each layer, get the initial state. states will be a tuple of LSTMStateTuples. states = get_state_variables(batch_size, cell) # Unroll the LSTM outputs, new_states = tf.nn.dynamic_rnn(cell, data, initial_state=states) # Add an operation to update the train states with the last state tensors. update_op = get_state_update_op(states, new_states) sess = tf.Session() sess.run(tf.global_variables_initializer()) sess.run([outputs, update_op], {data: ...})

La principal diferencia es que state_is_tuple=True hace que el estado de LSTM sea un LSTMStateTuple que contiene dos variables (estado de celda y estado oculto) en lugar de una sola variable. El uso de múltiples capas hace que el estado de LSTM sea una tupla de LSTMStateTuples, una por capa.

¿Podría alguien aclarar si el estado inicial de RNN en TF se restablece para los siguientes mini lotes, o si se usa el último estado del mini-lote anterior como se menciona en Ilya Sutskever et al., ICLR 2015 ?

Las tf.nn.dynamic_rnn() o tf.nn.rnn() permiten especificar el estado inicial de RNN utilizando el parámetro initial_state . Si no especifica este parámetro, los estados ocultos se inicializarán a cero vectores al comienzo de cada lote de entrenamiento.

En TensorFlow, puede ajustar los tensores en tf.Variable() para mantener sus valores en el gráfico entre varias ejecuciones de sesión. Solo asegúrese de marcarlos como no entrenables porque los optimizadores ajustan todas las variables entrenables de forma predeterminada.

data = tf.placeholder(tf.float32, (batch_size, max_length, frame_size)) cell = tf.nn.rnn_cell.GRUCell(256) state = tf.Variable(cell.zero_states(batch_size, tf.float32), trainable=False) output, new_state = tf.nn.dynamic_rnn(cell, data, initial_state=state) with tf.control_dependencies([state.assign(new_state)]): output = tf.identity(output) sess = tf.Session() sess.run(tf.initialize_all_variables()) sess.run(output, {data: ...})

No he probado este código, pero debería darle una pista en la dirección correcta. También hay un tf.nn.state_saving_rnn() al que puede proporcionar un objeto de ahorro de estado, pero no lo he usado todavía.