java - images - load image processing

Buscar sub imagen conocida en una imagen más grande (6)

Dijo que su imagen puede no ser exactamente la misma, pero luego dice que no quiere algoritmos "confusos". No estoy seguro de que sean compatibles. Sin embargo, en general, creo que desea ver los algoritmos de registro de imágenes. Hay un paquete de código abierto de C ++ llamado ITK que podría proporcionar algunos consejos. También ImageJ es un popular paquete de código abierto de Java. Ambos tienen al menos algunas capacidades de registro disponibles si se hurga.

¿Alguien sabe de un algoritmo (o términos / descripciones de búsqueda) para ubicar una imagen conocida dentro de una imagen más grande?

p.ej

Tengo una imagen de una sola ventana de escritorio que contiene varios botones y áreas (destino). También tengo un código para capturar una captura de pantalla del escritorio actual. Me gustaría un algoritmo que me ayude a encontrar la imagen de destino dentro de la imagen de escritorio más grande (en qué coordenadas xey exactas se encuentra la ventana). La imagen de destino puede estar ubicada en cualquier lugar de la imagen más grande y puede no ser 100% exactamente igual (muy similar, pero no exacta, posiblemente b / c de diferencias de visualización del sistema operativo)

¿Alguien sabe de tal algoritmo o clase de algoritmos?

He encontrado varios algoritmos de segmentación de imágenes y visión por computador, pero parecen estar orientados a una clasificación "confusa" de regiones y no a ubicar una imagen específica dentro de otra.

** Mi objetivo es crear un marco que, dadas algunas imágenes objetivo, pueda "buscar" en el escritorio, encontrar el área objetivo y "mirar" los cambios. **

Eche un vistazo al documento que escribí: http://werner.yellowcouch.org/Papers/subimg/index.html . Es muy detallado y parece ser el único artículo que debate cómo aplicar la transformación de Fourier al problema del descubrimiento de subimágenes.

En resumen, si desea utilizar la transformada de Fourier, se podría aplicar la siguiente fórmula: la correlación entre la imagen A y la imagen B cuando la imagen A se desplaza sobre dx, dy se da en la siguiente matriz: C = ifft (fft (A) x conjugado (fft (B)). Entonces, la posición en la imagen C que tiene el valor más alto, tiene la correlación más alta y esa posición refleja dx, dy.

Este resultado funciona bien para subimágenes que son relativamente grandes. Para imágenes más pequeñas, es necesario un poco más de trabajo como se explica en el artículo. Sin embargo, tales transformaciones de Fourier son bastante rápidas. Resulta en operaciones de 3 * sx sy log_2 (sx * sy) + 3 * sx * sy.

No es necesario que sea borroso como en "red neuronal" porque (según entiendo) no tiene rotación, inclinación ni similar. Si las diferencias de visualización del sistema operativo son las únicas modificaciones, la diferencia debería ser mínima. Así que el artículo de WernerVanBelle es bueno pero no realmente necesario y el código de MrFooz funciona, pero es terriblemente ineficiente ( O(width * height * patter_width * pattern_height) !).

El mejor algoritmo en el que puedo pensar es el algoritmo Boyer-Moore para búsqueda de cadenas, modificado para permitir búsquedas bidimensionales. http://en.wikipedia.org/wiki/Boyer%E2%80%93Moore_string_search_algorithm

En lugar de un desplazamiento, deberá almacenar un par de desplazamientos dx y dy para cada color. Al verificar un píxel, solo se mueve en la dirección x = x + dx y se almacena solo el mínimo de dy dy DY = min(DY, dy) para establecer el nuevo valor y una vez que se ha probado una línea completa (es decir, x > width )

Crear una tabla para todos los colores posibles probablemente sea prohibitivo debido a la cantidad de colores posibles, así que use un mapa para almacenar las reglas (y de forma predeterminada las dimensiones del patrón si el color no está dentro del mapa) o cree tablas para cada color por separado y establecer dx = max(dx(red), dx(green), dx(blue)) - que es solo una aproximación pero elimina la sobrecarga de un mapa.

En el preprocesamiento de la regla de caracteres incorrectos, puede tener en cuenta pequeñas desviaciones de colores mediante la distribución de reglas de todos los colores a sus colores "vecinos" (como quiera definir con el vecino).

Consideré la solución de Werner Van Belle (ya que todos los demás enfoques son increíblemente lentos, nada practicables):

Un filtro adaptativo para la correcta localización de las subimágenes: la localización de subimágenes de FFT requiere una normalización de la imagen para funcionar correctamente

Escribí el código en C # donde necesito mi solución, pero obtengo resultados muy imprecisos. ¿Realmente no funciona bien, contrario a lo que dice Van Belle, o hice algo mal? Usé https://github.com/tszalay/FFTWSharp como un contenedor C # para FFTW.

Aquí está mi código traducido: (original en C ++ en http://werner.yellowcouch.org/Papers/subimg/index.html )

using System.Diagnostics; using System; using System.Runtime.InteropServices; using System.Drawing; using System.Drawing.Imaging; using System.IO; using FFTWSharp; using unsigned1 = System.Byte; using signed2 = System.Int16; using signed8 = System.Int64; public class Subimage { /** * This program finds a subimage in a larger image. It expects two arguments. * The first is the image in which it must look. The secon dimage is the * image that is to be found. The program relies on a number of different * steps to perform the calculation. * * It will first normalize the input images in order to improve the * crosscorrelation matching. Once the best crosscorrelation is found * a sad-matchers is applied in a grid over the larger image. * * The following two article explains the details: * * Werner Van Belle; An Adaptive Filter for the Correct Localization * of Subimages: FFT based Subimage Localization Requires Image * Normalization to work properly; 11 pages; October 2007. * http://werner.yellowcouch.org/Papers/subimg/ * * Werner Van Belle; Correlation between the inproduct and the sum * of absolute differences is -0.8485 for uniform sampled signals on * [-1:1]; November 2006 */ unsafe public static Point FindSubimage_fftw(string[] args) { if (args == null || args.Length != 2) { Console.Write("Usage: subimg/n" + "/n" + " subimg is an image matcher. It requires two arguments. The first/n" + " image should be the larger of the two. The program will search/n" + " for the best position to superimpose the needle image over the/n" + " haystack image. The output of the the program are the X and Y/n" + " coordinates./n/n" + " http://werner.yellowouch.org/Papers/subimg//n"); return new Point(); } /** * The larger image will be called A. The smaller image will be called B. * * The code below relies heavily upon fftw. The indices necessary for the * fast r2c and c2r transforms are at best confusing. Especially regarding * the number of rows and colums (watch our for Asx vs Asx2 !). * * After obtaining all the crosscorrelations we will scan through the image * to find the best sad match. As such we make a backup of the original data * in advance * */ int Asx = 0, Asy = 0; signed2[] A = read_image(args[0], ref Asx, ref Asy); int Asx2 = Asx / 2 + 1; int Bsx = 0, Bsy = 0; signed2[] B = read_image(args[1], ref Bsx, ref Bsy); unsigned1[] Asad = new unsigned1[Asx * Asy]; unsigned1[] Bsad = new unsigned1[Bsx * Bsy]; for (int i = 0; i < Bsx * Bsy; i++) { Bsad[i] = (unsigned1)B[i]; Asad[i] = (unsigned1)A[i]; } for (int i = Bsx * Bsy; i < Asx * Asy; i++) Asad[i] = (unsigned1)A[i]; /** * Normalization and windowing of the images. * * The window size (wx,wy) is half the size of the smaller subimage. This * is useful to have as much good information from the subimg. */ int wx = Bsx / 2; int wy = Bsy / 2; normalize(ref B, Bsx, Bsy, wx, wy); normalize(ref A, Asx, Asy, wx, wy); /** * Preparation of the fourier transforms. * Aa is the amplitude of image A. Af is the frequence of image A * Similar for B. crosscors will hold the crosscorrelations. */ IntPtr Aa = fftw.malloc(sizeof(double) * Asx * Asy); IntPtr Af = fftw.malloc(sizeof(double) * 2 * Asx2 * Asy); IntPtr Ba = fftw.malloc(sizeof(double) * Asx * Asy); IntPtr Bf = fftw.malloc(sizeof(double) * 2 * Asx2 * Asy); /** * The forward transform of A goes from Aa to Af * The forward tansform of B goes from Ba to Bf * In Bf we will also calculate the inproduct of Af and Bf * The backward transform then goes from Bf to Aa again. That * variable is aliased as crosscors; */ //#original: fftw_plan_dft_r2c_2d //IntPtr forwardA = fftwf.dft(2, new int[] { Asy, Asx }, Aa, Af, fftw_direction.Forward, fftw_flags.Estimate);//equal results IntPtr forwardA = fftwf.dft_r2c_2d(Asy, Asx, Aa, Af, fftw_flags.Estimate); //#original: fftw_plan_dft_r2c_2d //IntPtr forwardB = fftwf.dft(2, new int[] { Asy, Asx }, Ba, Bf, fftw_direction.Forward, fftw_flags.Estimate);//equal results IntPtr forwardB = fftwf.dft_r2c_2d(Asy, Asx, Ba, Bf, fftw_flags.Estimate); double* crosscorrs = (double*)Aa; //#original: fftw_plan_dft_c2r_2d //IntPtr backward = fftwf.dft(2, new int[] { Asy, Asx }, Bf, Aa, fftw_direction.Backward, fftw_flags.Estimate);//equal results IntPtr backward = fftwf.dft_c2r_2d(Asy, Asx, Bf, Aa, fftw_flags.Estimate); /** * The two forward transforms of A and B. Before we do so we copy the normalized * data into the double array. For B we also pad the data with 0 */ for (int row = 0; row < Asy; row++) for (int col = 0; col < Asx; col++) ((double*)Aa)[col + Asx * row] = A[col + Asx * row]; fftw.execute(forwardA); for (int j = 0; j < Asx * Asy; j++) ((double*)Ba)[j] = 0; for (int row = 0; row < Bsy; row++) for (int col = 0; col < Bsx; col++) ((double*)Ba)[col + Asx * row] = B[col + Bsx * row]; fftw.execute(forwardB); /** * The inproduct of the two frequency domains and calculation * of the crosscorrelations */ double norm = Asx * Asy; for (int j = 0; j < Asx2 * Asy; j++) { double a = ((double*)Af)[j * 2];//#Af[j][0]; double b = ((double*)Af)[j * 2 + 1];//#Af[j][1]; double c = ((double*)Bf)[j * 2];//#Bf[j][0]; double d = ((double*)Bf)[j * 2 + 1];//#-Bf[j][1]; double e = a * c - b * d; double f = a * d + b * c; ((double*)Bf)[j * 2] = (double)(e / norm);//#Bf[j][0] = (fftw_real)(e / norm); ((double*)Bf)[j * 2 + 1] = (double)(f / norm);//Bf[j][1] = (fftw_real)(f / norm); } fftw.execute(backward); /** * We now have a correlation map. We can spent one more pass * over the entire image to actually find the best matching images * as defined by the SAD. * We calculate this by gridding the entire image according to the * size of the subimage. In each cel we want to know what the best * match is. */ int sa = 1 + Asx / Bsx; int sb = 1 + Asy / Bsy; int sadx = 0; int sady = 0; signed8 minsad = Bsx * Bsy * 256L; for (int a = 0; a < sa; a++) { int xl = a * Bsx; int xr = xl + Bsx; if (xr > Asx) continue; for (int b = 0; b < sb; b++) { int yl = b * Bsy; int yr = yl + Bsy; if (yr > Asy) continue; // find the maximum correlation in this cell int cormxat = xl + yl * Asx; double cormx = crosscorrs[cormxat]; for (int x = xl; x < xr; x++) for (int y = yl; y < yr; y++) { int j = x + y * Asx; if (crosscorrs[j] > cormx) cormx = crosscorrs[cormxat = j]; } int corx = cormxat % Asx; int cory = cormxat / Asx; // We dont want subimages that fall of the larger image if (corx + Bsx > Asx) continue; if (cory + Bsy > Asy) continue; signed8 sad = 0; for (int x = 0; sad < minsad && x < Bsx; x++) for (int y = 0; y < Bsy; y++) { int j = (x + corx) + (y + cory) * Asx; int i = x + y * Bsx; sad += Math.Abs((int)Bsad[i] - (int)Asad[j]); } if (sad < minsad) { minsad = sad; sadx = corx; sady = cory; // printf("* "); } // printf("Grid (%d,%d) (%d,%d) Sip=%g Sad=%lld/n",a,b,corx,cory,cormx,sad); } } //Console.Write("{0:D}/t{1:D}/n", sadx, sady); /** * Aa, Ba, Af and Bf were allocated in this function * crosscorrs was an alias for Aa and does not require deletion. */ fftw.free(Aa); fftw.free(Ba); fftw.free(Af); fftw.free(Bf); return new Point(sadx, sady); } private static void normalize(ref signed2[] img, int sx, int sy, int wx, int wy) { /** * Calculate the mean background. We will subtract this * from img to make sure that it has a mean of zero * over a window size of wx x wy. Afterwards we calculate * the square of the difference, which will then be used * to normalize the local variance of img. */ signed2[] mean = boxaverage(img, sx, sy, wx, wy); signed2[] sqr = new signed2[sx * sy]; for (int j = 0; j < sx * sy; j++) { img[j] -= mean[j]; signed2 v = img[j]; sqr[j] = (signed2)(v * v); } signed2[] var = boxaverage(sqr, sx, sy, wx, wy); /** * The normalization process. Currenlty still * calculated as doubles. Could probably be fixed * to integers too. The only problem is the sqrt */ for (int j = 0; j < sx * sy; j++) { double v = Math.Sqrt(Math.Abs((double)var[j]));//#double v = sqrt(fabs(var[j])); <- ambigous Debug.Assert(!double.IsInfinity(v) && v >= 0); if (v < 0.0001) v = 0.0001; img[j] = (signed2)(img[j] * (32 / v)); if (img[j] > 127) img[j] = 127; if (img[j] < -127) img[j] = -127; } /** * As a last step in the normalization we * window the sub image around the borders * to become 0 */ window(ref img, sx, sy, wx, wy); } private static signed2[] boxaverage(signed2[] input, int sx, int sy, int wx, int wy) { signed2[] horizontalmean = new signed2[sx * sy]; Debug.Assert(horizontalmean != null); int wx2 = wx / 2; int wy2 = wy / 2; int from = (sy - 1) * sx; int to = (sy - 1) * sx; int initcount = wx - wx2; if (sx < initcount) initcount = sx; int xli = -wx2; int xri = wx - wx2; for (; from >= 0; from -= sx, to -= sx) { signed8 sum = 0; int count = initcount; for (int c = 0; c < count; c++) sum += (signed8)input[c + from]; horizontalmean[to] = (signed2)(sum / count); int xl = xli, x = 1, xr = xri; /** * The area where the window is slightly outside the * left boundary. Beware: the right bnoundary could be * outside on the other side already */ for (; x < sx; x++, xl++, xr++) { if (xl >= 0) break; if (xr < sx) { sum += (signed8)input[xr + from]; count++; } horizontalmean[x + to] = (signed2)(sum / count); } /** * both bounds of the sliding window * are fully inside the images */ for (; xr < sx; x++, xl++, xr++) { sum -= (signed8)input[xl + from]; sum += (signed8)input[xr + from]; horizontalmean[x + to] = (signed2)(sum / count); } /** * the right bound is falling of the page */ for (; x < sx; x++, xl++) { sum -= (signed8)input[xl + from]; count--; horizontalmean[x + to] = (signed2)(sum / count); } } /** * The same process as aboe but for the vertical dimension now */ int ssy = (sy - 1) * sx + 1; from = sx - 1; signed2[] verticalmean = new signed2[sx * sy]; Debug.Assert(verticalmean != null); to = sx - 1; initcount = wy - wy2; if (sy < initcount) initcount = sy; int initstopat = initcount * sx; int yli = -wy2 * sx; int yri = (wy - wy2) * sx; for (; from >= 0; from--, to--) { signed8 sum = 0; int count = initcount; for (int d = 0; d < initstopat; d += sx) sum += (signed8)horizontalmean[d + from]; verticalmean[to] = (signed2)(sum / count); int yl = yli, y = 1, yr = yri; for (; y < ssy; y += sx, yl += sx, yr += sx) { if (yl >= 0) break; if (yr < ssy) { sum += (signed8)horizontalmean[yr + from]; count++; } verticalmean[y + to] = (signed2)(sum / count); } for (; yr < ssy; y += sx, yl += sx, yr += sx) { sum -= (signed8)horizontalmean[yl + from]; sum += (signed8)horizontalmean[yr + from]; verticalmean[y + to] = (signed2)(sum / count); } for (; y < ssy; y += sx, yl += sx) { sum -= (signed8)horizontalmean[yl + from]; count--; verticalmean[y + to] = (signed2)(sum / count); } } return verticalmean; } private static void window(ref signed2[] img, int sx, int sy, int wx, int wy) { int wx2 = wx / 2; int sxsy = sx * sy; int sx1 = sx - 1; for (int x = 0; x < wx2; x++) for (int y = 0; y < sxsy; y += sx) { img[x + y] = (signed2)(img[x + y] * x / wx2); img[sx1 - x + y] = (signed2)(img[sx1 - x + y] * x / wx2); } int wy2 = wy / 2; int syb = (sy - 1) * sx; int syt = 0; for (int y = 0; y < wy2; y++) { for (int x = 0; x < sx; x++) { /** * here we need to recalculate the stuff (*y/wy2) * to preserve the discrete nature of integers. */ img[x + syt] = (signed2)(img[x + syt] * y / wy2); img[x + syb] = (signed2)(img[x + syb] * y / wy2); } /** * The next row for the top rows * The previous row for the bottom rows */ syt += sx; syb -= sx; } } private static signed2[] read_image(string filename, ref int sx, ref int sy) { Bitmap image = new Bitmap(filename); sx = image.Width; sy = image.Height; signed2[] GreyImage = new signed2[sx * sy]; BitmapData bitmapData1 = image.LockBits(new Rectangle(0, 0, image.Width, image.Height), ImageLockMode.ReadOnly, PixelFormat.Format32bppArgb); unsafe { byte* imagePointer = (byte*)bitmapData1.Scan0; for (int y = 0; y < bitmapData1.Height; y++) { for (int x = 0; x < bitmapData1.Width; x++) { GreyImage[x + y * sx] = (signed2)((imagePointer[0] + imagePointer[1] + imagePointer[2]) / 3.0); //4 bytes per pixel imagePointer += 4; }//end for x //4 bytes per pixel imagePointer += bitmapData1.Stride - (bitmapData1.Width * 4); }//end for y }//end unsafe image.UnlockBits(bitmapData1); return GreyImage; } }

Puede usar elementos visuales únicos de esta área objetivo para determinar su posición. Estos elementos visuales únicos son como una "firma". Ejemplos: íconos, imágenes y símbolos únicos. Este enfoque funciona independientemente de la resolución de la ventana si tiene elementos únicos en las esquinas. Para ventanas de tamaño fijo, solo un elemento es suficiente para encontrar todas las coordenadas de la ventana.

A continuación, ilustraré la idea con un ejemplo simple usando Marvin Framework .

Elementos únicos:

Salida del programa:

Imagen original:
window.png

Código fuente:

import static marvin.MarvinPluginCollection.*; public class FindSubimageWindow { public FindSubimageWindow(){ MarvinImage window = MarvinImageIO.loadImage("./res/window.png"); MarvinImage eclipse = MarvinImageIO.loadImage("./res/eclipse_icon.png"); MarvinImage progress = MarvinImageIO.loadImage("./res/progress_icon.png"); MarvinSegment seg1, seg2; seg1 = findSubimage(eclipse, window, 0, 0); drawRect(window, seg1.x1, seg1.y1, seg1.x2-seg1.x1, seg1.y2-seg1.y1); seg2 = findSubimage(progress, window, 0, 0); drawRect(window, seg2.x1, seg2.y1, seg2.x2-seg2.x1, seg2.y2-seg2.y1); drawRect(window, seg1.x1-10, seg1.y1-10, (seg2.x2-seg1.x1)+25, (seg2.y2-seg1.y1)+20); MarvinImageIO.saveImage(window, "./res/window_out.png"); } private void drawRect(MarvinImage image, int x, int y, int width, int height){ x-=4; y-=4; width+=8; height+=8; image.drawRect(x, y, width, height, Color.red); image.drawRect(x+1, y+1, width-2, height-2, Color.red); image.drawRect(x+2, y+2, width-4, height-4, Color.red); } public static void main(String[] args) { new FindSubimageWindow(); } }

Aquí está el esqueleto de código que te gustaría usar:

// look for all (x,y) positions where target appears in desktop List<Loc> findMatches(Image desktop, Image target, float threshold) { List<Loc> locs; for (int y=0; y<desktop.height()-target.height(); y++) { for (int x=0; x<desktop.width()-target.width(); x++) { if (imageDistance(desktop, x, y, target) < threshold) { locs.append(Loc(x,y)); } } } return locs; } // computes the root mean squared error between a rectangular window in // bigImg and target. float imageDistance(Image bigImg, int bx, int by, Image target) { float dist = 0.0; for (int y=0; y<target.height(); y++) { for (int x=0; x<target.width(); x++) { // assume RGB images... for (int colorChannel=0; colorChannel<3; colorChannel++) { dist += Math.pow(target.getPixel(x,y) - bigImg.getPixel(bx+x,by+y), 2); } } } return Math.sqrt(dist) / target.width() / target.height(); }

Podría considerar otras distancias de imagen (vea una pregunta similar ). Para su aplicación, el error RMS es probablemente una buena opción.

Probablemente haya varias bibliotecas de Java que calculan esta distancia para usted de manera eficiente.