python - recognition - ¿Cómo puedo hacer que el modelo inception-v3 entrenado previamente desde Imagenet(classify_image.py) en el tutorial de Tensorflow sea importable como un módulo?

tensor flow model (3)

Me pregunto cómo puedo modificar classify_image.py (de este tutorial para poder importarlo de otro script de Python. Básicamente me gustaría que tenga la misma funcionalidad que ya tiene, pero en lugar de proporcionar la ruta de la imagen y obtener la respuesta impresa en la terminal, me gustaría asignarle una función a la ruta de la imagen y obtener la función para devolver los 5 primeros resultados con sus probabilidades.

Todavía no he encontrado una solución directa a este problema, pero me doy cuenta de que mi solución de problemas y la búsqueda de respuestas anteriores son limitadas, ya que desafortunadamente aún no aprendí los conceptos básicos de Tensorflow.

Por supuesto, si hay otro modelo de Tensorflow preentrenado que sea igual de bueno y satisfaga mis demandas, lo usaría felizmente.

Saludos, Pontus

ACTUALIZACIÓN Tal vez debería aclarar un poco:

No quiero entrenar a un modelo, solo use uno previamente entrenado para el reconocimiento de imágenes, y en este caso tengo un script de reconocimiento de imágenes que podría importar como módulo en otra aplicación de Python.

También he intentado con el código de este tutorial pero también me quedé atrapado allí, y en ese caso incluye mucha instalación manual en la que podría haber fallado en algún paso. Lo bueno del ejemplo de classify_image.py es que lo hice funcionar según lo previsto en el tutorial, así que pensé que el paso de eso a usarlo como un módulo conectable no debería ser tan grande.

Lo que he intentado (con classify_image.py) es mover las líneas de abajo if __name__ = ''__main__'' a main(_) para que se ejecuten cuando las llamo desde otro script, pero sigo teniendo problemas. Principalmente estoy teniendo problemas con la función main(_) , que quiere que yo le pase una discusión, y al buscar alrededor pensé _ parece ser algún tipo de marcador de posición usado cuando se recibe información del cli. Todas las cosas de FLAGS parecen estar relacionadas también, y es de lo que quiero alejarme. Tampoco estoy seguro de si los pesos del modelo, etc. se guardan correctamente para que pueda usarlo desde otro script. Una vez más, en este punto solo quiero jugar con el clasificador de imágenes y más adelante, con suerte, aprender más sobre el aprendizaje de la máquina detrás de él. Perdón por mi falta de conocimiento en lo básico de esto!

classify_image.py:

# Copyright 2015 The TensorFlow Authors. All Rights Reserved. # # Licensed under the Apache License, Version 2.0 (the "License"); # you may not use this file except in compliance with the License. # You may obtain a copy of the License at # # http://www.apache.org/licenses/LICENSE-2.0 # # Unless required by applicable law or agreed to in writing, software # distributed under the License is distributed on an "AS IS" BASIS, # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. # See the License for the specific language governing permissions and # limitations under the License. # ============================================================================== """Simple image classification with Inception. Run image classification with Inception trained on ImageNet 2012 Challenge data set. This program creates a graph from a saved GraphDef protocol buffer, and runs inference on an input JPEG image. It outputs human readable strings of the top 5 predictions along with their probabilities. Change the --image_file argument to any jpg image to compute a classification of that image. Please see the tutorial and website for a detailed description of how to use this script to perform image recognition. https://tensorflow.org/tutorials/image_recognition/ """ from __future__ import absolute_import from __future__ import division from __future__ import print_function import argparse import os.path import re import sys import tarfile import numpy as np from six.moves import urllib import tensorflow as tf FLAGS = None # pylint: disable=line-too-long DATA_URL = ''http://download.tensorflow.org/models/image/imagenet/inception-2015-12-05.tgz'' # pylint: enable=line-too-long class NodeLookup(object): """Converts integer node ID''s to human readable labels.""" def __init__(self, label_lookup_path=None, uid_lookup_path=None): if not label_lookup_path: label_lookup_path = os.path.join( FLAGS.model_dir, ''imagenet_2012_challenge_label_map_proto.pbtxt'') if not uid_lookup_path: uid_lookup_path = os.path.join( FLAGS.model_dir, ''imagenet_synset_to_human_label_map.txt'') self.node_lookup = self.load(label_lookup_path, uid_lookup_path) def load(self, label_lookup_path, uid_lookup_path): """Loads a human readable English name for each softmax node. Args: label_lookup_path: string UID to integer node ID. uid_lookup_path: string UID to human-readable string. Returns: dict from integer node ID to human-readable string. """ if not tf.gfile.Exists(uid_lookup_path): tf.logging.fatal(''File does not exist %s'', uid_lookup_path) if not tf.gfile.Exists(label_lookup_path): tf.logging.fatal(''File does not exist %s'', label_lookup_path) # Loads mapping from string UID to human-readable string proto_as_ascii_lines = tf.gfile.GFile(uid_lookup_path).readlines() uid_to_human = {} p = re.compile(r''[n/d]*[ /S,]*'') for line in proto_as_ascii_lines: parsed_items = p.findall(line) uid = parsed_items[0] human_string = parsed_items[2] uid_to_human[uid] = human_string # Loads mapping from string UID to integer node ID. node_id_to_uid = {} proto_as_ascii = tf.gfile.GFile(label_lookup_path).readlines() for line in proto_as_ascii: if line.startswith('' target_class:''): target_class = int(line.split('': '')[1]) if line.startswith('' target_class_string:''): target_class_string = line.split('': '')[1] node_id_to_uid[target_class] = target_class_string[1:-2] # Loads the final mapping of integer node ID to human-readable string node_id_to_name = {} for key, val in node_id_to_uid.items(): if val not in uid_to_human: tf.logging.fatal(''Failed to locate: %s'', val) name = uid_to_human[val] node_id_to_name[key] = name return node_id_to_name def id_to_string(self, node_id): if node_id not in self.node_lookup: return '''' return self.node_lookup[node_id] def create_graph(): """Creates a graph from saved GraphDef file and returns a saver.""" # Creates graph from saved graph_def.pb. with tf.gfile.FastGFile(os.path.join( FLAGS.model_dir, ''classify_image_graph_def.pb''), ''rb'') as f: graph_def = tf.GraphDef() graph_def.ParseFromString(f.read()) _ = tf.import_graph_def(graph_def, name='''') def run_inference_on_image(image): """Runs inference on an image. Args: image: Image file name. Returns: Nothing """ if not tf.gfile.Exists(image): tf.logging.fatal(''File does not exist %s'', image) image_data = tf.gfile.FastGFile(image, ''rb'').read() # Creates graph from saved GraphDef. create_graph() with tf.Session() as sess: # Some useful tensors: # ''softmax:0'': A tensor containing the normalized prediction across # 1000 labels. # ''pool_3:0'': A tensor containing the next-to-last layer containing 2048 # float description of the image. # ''DecodeJpeg/contents:0'': A tensor containing a string providing JPEG # encoding of the image. # Runs the softmax tensor by feeding the image_data as input to the graph. softmax_tensor = sess.graph.get_tensor_by_name(''softmax:0'') predictions = sess.run(softmax_tensor, {''DecodeJpeg/contents:0'': image_data}) predictions = np.squeeze(predictions) # Creates node ID --> English string lookup. node_lookup = NodeLookup() top_k = predictions.argsort()[-FLAGS.num_top_predictions:][::-1] for node_id in top_k: human_string = node_lookup.id_to_string(node_id) score = predictions[node_id] print(''%s (score = %.5f)'' % (human_string, score)) def maybe_download_and_extract(): """Download and extract model tar file.""" dest_directory = FLAGS.model_dir if not os.path.exists(dest_directory): os.makedirs(dest_directory) filename = DATA_URL.split(''/'')[-1] filepath = os.path.join(dest_directory, filename) if not os.path.exists(filepath): def _progress(count, block_size, total_size): sys.stdout.write(''/r>> Downloading %s %.1f%%'' % ( filename, float(count * block_size) / float(total_size) * 100.0)) sys.stdout.flush() filepath, _ = urllib.request.urlretrieve(DATA_URL, filepath, _progress) print() statinfo = os.stat(filepath) print(''Successfully downloaded'', filename, statinfo.st_size, ''bytes.'') tarfile.open(filepath, ''r:gz'').extractall(dest_directory) def main(_): maybe_download_and_extract() image = (FLAGS.image_file if FLAGS.image_file else os.path.join(FLAGS.model_dir, ''cropped_panda.jpg'')) run_inference_on_image(image) if __name__ == ''__main__'': parser = argparse.ArgumentParser() # classify_image_graph_def.pb: # Binary representation of the GraphDef protocol buffer. # imagenet_synset_to_human_label_map.txt: # Map from synset ID to a human readable string. # imagenet_2012_challenge_label_map_proto.pbtxt: # Text representation of a protocol buffer mapping a label to synset ID. parser.add_argument( ''--model_dir'', type=str, default=''/tmp/imagenet'', help="""/ Path to classify_image_graph_def.pb, imagenet_synset_to_human_label_map.txt, and imagenet_2012_challenge_label_map_proto.pbtxt./ """ ) parser.add_argument( ''--image_file'', type=str, default='''', help=''Absolute path to image file.'' ) parser.add_argument( ''--num_top_predictions'', type=int, default=5, help=''Display this many predictions.'' ) FLAGS, unparsed = parser.parse_known_args() tf.app.run(main=main, argv=[sys.argv[0]] + unparsed)

1) La primera pregunta es sobre cómo devolver los valores predichos. El siguiente fragmento de código premite en la imagen dada:

top_k = predictions.argsort()[-FLAGS.num_top_predictions:][::-1] for node_id in top_k: human_string = node_lookup.id_to_string(node_id) score = predictions[node_id] print(''%s (score = %.5f)'' % (human_string, score))

En lugar de imprimir puede guardar el resultado en alguna estructura de datos y regresar. De forma predeterminada, se devolverán 5 predicciones principales, si desea cambiar este comportamiento establezca el valor adecuado en --num_top_predictions .

2) En relación con el modelo : hay dos partes:

Necesita tener un conjunto de datos de calidad como lo es Imagenet.
Suponiendo que tenga ese conjunto de datos de calidad, la infraestructura para el inicio del entrenamiento requeriría GPU muy potentes. Mucho tiempo también.

pero si todavía quiere entrenar su sistema con su propio conjunto de datos, yo diría que primero entrene con imagenet y luego entrene la capa final (el nombre del tensor es '' final_result '') con su propio conjunto de datos. Por favor encuentra este tutorial

Al final, logré usar el código del artículo SO al que se hace referencia en la actualización de la pregunta original. im = 2*(im/255.0)-1.0 el código con el im = 2*(im/255.0)-1.0 adicional im = 2*(im/255.0)-1.0 de la respuesta de dicha pregunta SO, alguna línea para arreglar el PIL en mi computadora más una función para convertir clases en etiquetas legibles para el ser humano (encontradas en github), enlace a ese archivo a continuación. Lo convertí en una función invocable que toma una lista de imágenes como entrada y saca una lista de etiquetas y valores de predicción. Si desea usarlo, esto es lo que debe hacer:

Instale la última versión de Tensorflow (1.0 en este momento, que es necesaria).
git clone https://github.com/tensorflow/models/ donde quieras los modelos.
Ponga este archivo de punto de control de la pregunta SO a la que me referí anteriormente (necesita ser extraído, por supuesto) en el directorio de su proyecto.
Coloque este archivo de texto (las etiquetas legibles para personas) en el directorio de su proyecto.
Use este código de la pregunta SO con algunas modificaciones de mi parte, póngalo en un archivo .py en su proyecto:
import tensorflow as tf slim = tf.contrib.slim import PIL as pillow from PIL import Image #import Image from inception_resnet_v2 import * import numpy as np with open(''imagenet1000_clsid_to_human.txt'',''r'') as inf: imagenet_classes = eval(inf.read()) def get_human_readable(id): id = id - 1 label = imagenet_classes[id] return label checkpoint_file = ''./inception_resnet_v2_2016_08_30.ckpt'' #Load the model sess = tf.Session() arg_scope = inception_resnet_v2_arg_scope() input_tensor = tf.placeholder(tf.float32, [None, 299, 299, 3]) with slim.arg_scope(arg_scope): logits, end_points = inception_resnet_v2(input_tensor, is_training=False) saver = tf.train.Saver() saver.restore(sess, checkpoint_file) def classify_image(sample_images): classifications = [] for image in sample_images: im = Image.open(image).resize((299,299)) im = np.array(im) im = im.reshape(-1,299,299,3) im = 2*(im/255.0)-1.0 predict_values, logit_values = sess.run([end_points[''Predictions''], logits], feed_dict={input_tensor: im}) #print (np.max(predict_values), np.max(logit_values)) #print (np.argmax(predict_values), np.argmax(logit_values)) label = get_human_readable(np.argmax(predict_values)) predict_value = np.max(predict_values) classifications.append({"label":label, "predict_value":predict_value}) return classifications

En mi caso, simplemente reemplace [-FLAGS.num_top_predictions:] con [-5:]

Luego reemplace otro FLAG con el directorio y archive la imagen.