reconocimiento - ocr software

¿Existe una biblioteca de OCR que genere coordenadas de palabras encontradas dentro de una imagen? (9)

ABCocr.NET ( nuestro componente ) le permitirá obtener las coordenadas de cada palabra encontrada. Se puede acceder a los valores a través de la propiedad Word.Bounds, que simplemente devuelve un System.Drawing.Rectangle.

El siguiente ejemplo muestra cómo puede OCR una imagen usando ABCocr.NET y generar la información que necesita:

using System; using System.Drawing; using WebSupergoo.ABCocr3; namespace abcocr { class Program { static void Main(string[] args) { Bitmap bitmap = (Bitmap)Bitmap.FromFile("example.png"); Ocr ocr = new Ocr(); ocr.SetBitmap(bitmap); foreach (Word word in ocr.Page.Words) { Console.WriteLine("{0}, X: {1}, Y: {2}, Width: {3}, Height: {4}", word.Text, word.Bounds.X, word.Bounds.Y, word.Bounds.Width, word.Bounds.Height); } } } }

Divulgación: publicado por un miembro del equipo de WebSupergoo.

En mi experiencia, las bibliotecas de OCR tienden a generar simplemente el texto encontrado dentro de una imagen, pero no donde se encontró el texto. ¿Existe una biblioteca de OCR que muestre tanto las palabras encontradas dentro de una imagen como las coordenadas ( x, y, width, height ) donde se encontraron esas palabras?

Estoy usando TessNet (una envoltura de Tesseract C #) y obtengo coordenadas de palabras con el siguiente código:

TextWriter tw = new StreamWriter(@"U:/user files/bwalker/ocrTesting.txt"); Bitmap image = new Bitmap(@"u:/user files/bwalker/2849257.tif"); tessnet2.Tesseract ocr = new tessnet2.Tesseract(); // If digit only ocr.SetVariable("tessedit_char_whitelist", "0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz.,$-/#&=()/"'':?"); // To use correct tessdata ocr.Init(@"C:/Users/bwalker/Documents/Visual Studio 2010/Projects/tessnetWinForms/tessnetWinForms/bin/Release/", "eng", false); List<tessnet2.Word> result = ocr.DoOCR(image, System.Drawing.Rectangle.Empty); string Results = ""; foreach (tessnet2.Word word in result) { Results += word.Confidence + ", " + word.Text + ", " +word.Top+", "+word.Bottom+", "+word.Left+", "+word.Right+"/n"; } using (StreamWriter writer = new StreamWriter(@"U:/user files/bwalker/ocrTesting2.txt", true)) { writer.WriteLine(Results);//+", "+word.Top+", "+word.Bottom+", "+word.Left+", "+word.Right); writer.Close(); } MessageBox.Show("Completed");

Google Vision API hace esto. https://cloud.google.com/vision/docs/detecting-text

"description": "Wake up human!/n", "boundingPoly": { "vertices": [ { "x": 29, "y": 394 }, { "x": 570, "y": 394 }, { "x": 570, "y": 466 }, { "x": 29, "y": 466 } ] }

La API gratuita de OCR.space OCR devuelve las coordenadas de la palabra una vez que establece el parámetro isOverlayRequired = true :

"ParsedResults" : [ { "TextOverlay" : { "Lines" : [ { "Words": [ { "WordText": "Word 1", "Left": 106, "Top": 91, "Height": 9, "Width": 11 }, { "WordText": "Word 2", "Left": 121, "Top": 90, "Height": 13, "Width": 51 } .

La mayoría de los motores de OCR comerciales devolverán las posiciones de coordenadas de palabras y caracteres, pero tendrá que trabajar con sus SDK para extraer la información. Incluso Tesseract OCR devolverá la información de posición, pero no ha sido fácil llegar. La versión 3.01 será más fácil, pero aún se está trabajando en una interfaz DLL.

Desafortunadamente, la mayoría de los programas gratuitos de OCR utilizan Tesseract OCR en su forma básica y solo informan los resultados ASCII sin procesar.

www.transym.com - Transym OCR - coordenadas de salida. www.rerecognition.com - El motor de Kasmos devuelve las coordenadas.

También Caere Omnipage, Mitek, Abbyy, Charactell devuelven las posiciones de los personajes.

Para desarrolladores de Java:

Recomendaré para esto usar Tesseract y Tess4j .

En realidad, puede encontrar un ejemplo sobre cómo encontrar palabras en una imagen en una de las pruebas de Tess4j.

https://github.com/nguyenq/tess4j/blob/master/src/test/java/net/sourceforge/tess4j/TessAPITest.java#L449-L517

public void testResultIterator() throws Exception { logger.info("TessBaseAPIGetIterator"); File tiff = new File(this.testResourcesDataPath, "eurotext.tif"); BufferedImage image = ImageIO.read(new FileInputStream(tiff)); // require jai-imageio lib to read TIFF ByteBuffer buf = ImageIOHelper.convertImageData(image); int bpp = image.getColorModel().getPixelSize(); int bytespp = bpp / 8; int bytespl = (int) Math.ceil(image.getWidth() * bpp / 8.0); api.TessBaseAPIInit3(handle, datapath, language); api.TessBaseAPISetPageSegMode(handle, TessPageSegMode.PSM_AUTO); api.TessBaseAPISetImage(handle, buf, image.getWidth(), image.getHeight(), bytespp, bytespl); ETEXT_DESC monitor = new ETEXT_DESC(); TimeVal timeout = new TimeVal(); timeout.tv_sec = new NativeLong(0L); // time > 0 causes blank ouput monitor.end_time = timeout; ProgressMonitor pmo = new ProgressMonitor(monitor); pmo.start(); api.TessBaseAPIRecognize(handle, monitor); logger.info("Message: " + pmo.getMessage()); TessResultIterator ri = api.TessBaseAPIGetIterator(handle); TessPageIterator pi = api.TessResultIteratorGetPageIterator(ri); api.TessPageIteratorBegin(pi); logger.info("Bounding boxes:/nchar(s) left top right bottom confidence font-attributes"); int level = TessPageIteratorLevel.RIL_WORD; // int height = image.getHeight(); do { Pointer ptr = api.TessResultIteratorGetUTF8Text(ri, level); String word = ptr.getString(0); api.TessDeleteText(ptr); float confidence = api.TessResultIteratorConfidence(ri, level); IntBuffer leftB = IntBuffer.allocate(1); IntBuffer topB = IntBuffer.allocate(1); IntBuffer rightB = IntBuffer.allocate(1); IntBuffer bottomB = IntBuffer.allocate(1); api.TessPageIteratorBoundingBox(pi, level, leftB, topB, rightB, bottomB); int left = leftB.get(); int top = topB.get(); int right = rightB.get(); int bottom = bottomB.get(); /******************************************/ /* COORDINATES AND WORDS ARE PRINTED HERE */ /******************************************/ System.out.print(String.format("%s %d %d %d %d %f", word, left, top, right, bottom, confidence)); // logger.info(String.format("%s %d %d %d %d", str, left, height - bottom, right, height - top)); // // training box coordinates IntBuffer boldB = IntBuffer.allocate(1); IntBuffer italicB = IntBuffer.allocate(1); IntBuffer underlinedB = IntBuffer.allocate(1); IntBuffer monospaceB = IntBuffer.allocate(1); IntBuffer serifB = IntBuffer.allocate(1); IntBuffer smallcapsB = IntBuffer.allocate(1); IntBuffer pointSizeB = IntBuffer.allocate(1); IntBuffer fontIdB = IntBuffer.allocate(1); String fontName = api.TessResultIteratorWordFontAttributes(ri, boldB, italicB, underlinedB, monospaceB, serifB, smallcapsB, pointSizeB, fontIdB); boolean bold = boldB.get() == TRUE; boolean italic = italicB.get() == TRUE; boolean underlined = underlinedB.get() == TRUE; boolean monospace = monospaceB.get() == TRUE; boolean serif = serifB.get() == TRUE; boolean smallcaps = smallcapsB.get() == TRUE; int pointSize = pointSizeB.get(); int fontId = fontIdB.get(); logger.info(String.format(" font: %s, size: %d, font id: %d, bold: %b," + " italic: %b, underlined: %b, monospace: %b, serif: %b, smallcap: %b", fontName, pointSize, fontId, bold, italic, underlined, monospace, serif, smallcaps)); } while (api.TessPageIteratorNext(pi, level) == TRUE); assertTrue(true); }

Puede utilizar el "archivo de configuración" hocr con tesseract así:

tesseract syllabus-page1.jpg syllabus-page1 hocr

Esto generará un documento mayormente HTML5 con elementos como:

<div class=''ocr_page'' id=''page_1'' title=''image "syllabus-page1.jpg"; bbox 0 0 2531 3272; ppageno 0''> <div class="ocr_carea" id="block_1_4" title="bbox 265 1183 2147 1778"> <p class="ocr_par" dir="ltr" id="par_1_8" title="bbox 274 1305 655 1342"> <span class="ocr_line" id="line_1_14" title="bbox 274 1305 655 1342; baseline -0.005 0; x_size 46.378059; x_descenders 10.378059; x_ascenders 12"> <span class="ocrx_word" id="word_1_78" title="bbox 274 1307 386 1342; x_wconf 90" lang="eng" dir="ltr">needs</span> <span class="ocrx_word" id="word_1_79" title="bbox 402 1318 459 1342; x_wconf 90" lang="eng" dir="ltr">are</span> <span class="ocrx_word" id="word_1_80" title="bbox 474 1305 655 1341; x_wconf 86" lang="eng" dir="ltr">different:</span> </span> </p> ... </div> ... </div>

Aunque estoy bastante seguro de que no es así como se supone que debes usar XML, me pareció más fácil que profundizar en la API de tesseract.

PD: Me doy cuenta de que varios comentarios y respuestas aluden a esta solución, pero ninguno de ellos realmente muestra cómo usar la opción hocr o describir la salida que se obtiene de eso.

También puede consultar el marco de Gamera ( http://gamera.informatik.hsnr.de/ ), es un conjunto de herramientas que le permite crear su propio motor de OCR. Sin embargo, la forma más rápida es utilizar Tesseract u OCRopus hOCR ( http://en.wikipedia.org/wiki/HOCR ).

hocr es uno de los formatos de salida del motor de OCR tesseract, que tiene ambas palabras y sus coordenadas, y también tiene información adicional como un nivel seguro de reconocimiento de palabras.