python - modelo - cómo implementar next_batch de tensorflow para datos propios

tensorflow python español (5)

El enlace que publicaste dice: "obtenemos un" lote "de cien puntos de datos aleatorios de nuestro conjunto de entrenamiento" . En mi ejemplo, uso una función global (no un método como en tu ejemplo), por lo que habrá una diferencia en la sintaxis.

En mi función, deberá pasar el número de muestras que se desea y la matriz de datos.

Aquí está el código correcto, que garantiza que las muestras tengan las etiquetas correctas:

import numpy as np def next_batch(num, data, labels): '''''' Return a total of `num` random samples and labels. '''''' idx = np.arange(0 , len(data)) np.random.shuffle(idx) idx = idx[:num] data_shuffle = [data[ i] for i in idx] labels_shuffle = [labels[ i] for i in idx] return np.asarray(data_shuffle), np.asarray(labels_shuffle) Xtr, Ytr = np.arange(0, 10), np.arange(0, 100).reshape(10, 10) print(Xtr) print(Ytr) Xtr, Ytr = next_batch(5, Xtr, Ytr) print(''/n5 random samples'') print(Xtr) print(Ytr)

Y una demostración de ejecución:

[0 1 2 3 4 5 6 7 8 9] [[ 0 1 2 3 4 5 6 7 8 9] [10 11 12 13 14 15 16 17 18 19] [20 21 22 23 24 25 26 27 28 29] [30 31 32 33 34 35 36 37 38 39] [40 41 42 43 44 45 46 47 48 49] [50 51 52 53 54 55 56 57 58 59] [60 61 62 63 64 65 66 67 68 69] [70 71 72 73 74 75 76 77 78 79] [80 81 82 83 84 85 86 87 88 89] [90 91 92 93 94 95 96 97 98 99]] 5 random samples [9 1 5 6 7] [[90 91 92 93 94 95 96 97 98 99] [10 11 12 13 14 15 16 17 18 19] [50 51 52 53 54 55 56 57 58 59] [60 61 62 63 64 65 66 67 68 69] [70 71 72 73 74 75 76 77 78 79]]

En el tutorial MNIST de tensorflow, la función mnist.train.next_batch(100) es muy útil. Ahora estoy tratando de implementar una clasificación simple a mí mismo. Tengo mis datos de entrenamiento en una matriz numpy. ¿Cómo podría implementar una función similar para mis propios datos para darme el siguiente lote?

sess = tf.InteractiveSession() tf.global_variables_initializer().run() Xtr, Ytr = loadData() for it in range(1000): batch_x = Xtr.next_batch(100) batch_y = Ytr.next_batch(100)

La respuesta que está marcada arriba probé el algoritmo con ese algoritmo. No obtengo resultados, así que busqué en Kaggle y vi un algoritmo realmente asombroso que funcionó muy bien. El mejor resultado prueba esto. En el algoritmo siguiente ** La variable global toma la entrada que declaró anteriormente en la que leyó su conjunto de datos. **

epochs_completed = 0 index_in_epoch = 0 num_examples = X_train.shape[0] # for splitting out batches of data def next_batch(batch_size): global X_train global y_train global index_in_epoch global epochs_completed start = index_in_epoch index_in_epoch += batch_size # when all trainig data have been already used, it is reorder randomly if index_in_epoch > num_examples: # finished epoch epochs_completed += 1 # shuffle the data perm = np.arange(num_examples) np.random.shuffle(perm) X_train = X_train[perm] y_train = y_train[perm] # start next epoch start = 0 index_in_epoch = batch_size assert batch_size <= num_examples end = index_in_epoch return X_train[start:end], y_train[start:end]

Para mezclar y muestrear cada mini-lote, también se debe considerar el estado de si una muestra ha sido seleccionada dentro de la época actual. Aquí hay una implementación que utiliza los datos de la respuesta anterior.

import numpy as np class Dataset: def __init__(self,data): self._index_in_epoch = 0 self._epochs_completed = 0 self._data = data self._num_examples = data.shape[0] pass @property def data(self): return self._data def next_batch(self,batch_size,shuffle = True): start = self._index_in_epoch if start == 0 and self._epochs_completed == 0: idx = np.arange(0, self._num_examples) # get all possible indexes np.random.shuffle(idx) # shuffle indexe self._data = self.data[idx] # get list of `num` random samples # go to the next batch if start + batch_size > self._num_examples: self._epochs_completed += 1 rest_num_examples = self._num_examples - start data_rest_part = self.data[start:self._num_examples] idx0 = np.arange(0, self._num_examples) # get all possible indexes np.random.shuffle(idx0) # shuffle indexes self._data = self.data[idx0] # get list of `num` random samples start = 0 self._index_in_epoch = batch_size - rest_num_examples #avoid the case where the #sample != integar times of batch_size end = self._index_in_epoch data_new_part = self._data[start:end] return np.concatenate((data_rest_part, data_new_part), axis=0) else: self._index_in_epoch += batch_size end = self._index_in_epoch return self._data[start:end] dataset = Dataset(np.arange(0, 10)) for i in range(10): print(dataset.next_batch(5))

la salida es:

[2 8 6 3 4] [1 5 9 0 7] [1 7 3 0 8] [2 6 5 9 4] [1 0 4 8 3] [7 6 2 9 5] [9 5 4 6 2] [0 1 8 7 3] [9 7 8 1 6] [3 5 2 4 0]

el primer y segundo (3ro y 4to, ...) mini lote corresponden a una época completa ..

Si no desea obtener un error de desajuste de forma en la ejecución de la sesión de tensorflow, utilice la función siguiente en lugar de la función que se proporciona en la primera solución anterior ( https://.com/a/40995666/7748451 ) -

def next_batch(num, data, labels): '''''' Return a total of `num` random samples and labels. '''''' idx = np.arange(0 , len(data)) np.random.shuffle(idx) idx = idx[:num] data_shuffle = data[idx] labels_shuffle = labels[idx] labels_shuffle = np.asarray(labels_shuffle.values.reshape(len(labels_shuffle), 1)) return data_shuffle, labels_shuffle

Yo uso Anaconda y Jupyter. En Jupyter, si ejecuta ?mnist obtendrá: File: c:/programdata/anaconda3/lib/site-packages/tensorflow/contrib/learn/python/learn/datasets/base.py Docstring: Datasets(train, validation, test)

En los conjuntos de datesets carpeta, encontrará mnist.py que contiene todos los métodos, incluido next_batch .