python-2.7 - robologs - segmentacion de color opencv

Detectar región de texto en imagen usando Opencv (2)

Si no le importa ensuciarse las manos, puede tratar de hacer crecer esas regiones de texto en una región rectangular más grande, que alimenta a tesseract de una vez.

También sugeriría intentar umbrar la imagen varias veces y alimentar a cada uno de ellos para que actúen por separado para ver si eso ayuda. Puede comparar la salida con las palabras del diccionario para determinar automáticamente si un resultado de OCR en particular es bueno o no.

Tengo una imagen y quiero detectar las regiones de texto en ella.

Intenté el proyecto TiRG_RAW_20110219 pero los resultados no son satisfactorios. Si la imagen de entrada es http://imgur.com/yCxOvQS,GD38rCa está produciendo http://imgur.com/yCxOvQS,GD38rCa#1 como salida.

¿Alguien puede sugerir alguna alternativa. Quería que esto mejorara la salida de tesseract enviando solo la región de texto como entrada.

import cv2 def captch_ex(file_name): img = cv2.imread(file_name) img_final = cv2.imread(file_name) img2gray = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY) ret, mask = cv2.threshold(img2gray, 180, 255, cv2.THRESH_BINARY) image_final = cv2.bitwise_and(img2gray, img2gray, mask=mask) ret, new_img = cv2.threshold(image_final, 180, 255, cv2.THRESH_BINARY) # for black text , cv.THRESH_BINARY_INV '''''' line 8 to 12 : Remove noisy portion '''''' kernel = cv2.getStructuringElement(cv2.MORPH_CROSS, (3, 3)) # to manipulate the orientation of dilution , large x means horizonatally dilating more, large y means vertically dilating more dilated = cv2.dilate(new_img, kernel, iterations=9) # dilate , more the iteration more the dilation # for cv2.x.x _, contours, hierarchy = cv2.findContours(dilated, cv2.RETR_EXTERNAL, cv2.CHAIN_APPROX_NONE) # findContours returns 3 variables for getting contours # for cv3.x.x comment above line and uncomment line below #image, contours, hierarchy = cv2.findContours(dilated,cv2.RETR_EXTERNAL,cv2.CHAIN_APPROX_NONE) for contour in contours: # get rectangle bounding contour [x, y, w, h] = cv2.boundingRect(contour) # Don''t plot small false positives that aren''t text if w < 35 and h < 35: continue # draw rectangle around contour on original image cv2.rectangle(img, (x, y), (x + w, y + h), (255, 0, 255), 2) '''''' #you can crop image and send to OCR , false detected will return no text :) cropped = img_final[y :y + h , x : x + w] s = file_name + ''/crop_'' + str(index) + ''.jpg'' cv2.imwrite(s , cropped) index = index + 1 '''''' # write original image with added contours to disk cv2.imshow(''captcha_result'', img) cv2.waitKey() file_name = ''your_image.jpg'' captch_ex(file_name)