performance - rapper - ¿Cómo encontrar el kth elemento más grande en una matriz no clasificada de longitud n en O(n)?

big o notation java (30)

Tener la cola de prioridad creada.
Insertar todos los elementos en el montón.
Llamar encuesta () k veces.
public static int getKthLargestElements(int[] arr) { PriorityQueue<Integer> pq = new PriorityQueue<>((x , y) -> (y-x)); //insert all the elements into heap for(int ele : arr) pq.offer(ele); // call poll() k times int i=0; while(i<k) { int result = pq.poll(); } return result; }

Creo que hay una manera de encontrar el elemento kth más grande en una matriz sin clasificar de longitud n en O (n). O tal vez sea "esperado" O (n) o algo así. ¿Cómo podemos hacer esto?

¿Qué tal este enfoque un poco

Mantener un buffer of length k y un tmp_max , obtener tmp_max es O (k) y se realiza n veces, así que algo como O(kn)

¿Está bien o me estoy perdiendo algo?

Aunque no supera el método promedio de selección rápida y el peor de los métodos de estadísticas medianas, es bastante fácil de entender e implementar.

A continuación se muestra el enlace a la implementación completa con una explicación bastante extensa de cómo funciona el algoritmo para encontrar el elemento Kth en un algoritmo no clasificado. La idea básica es particionar la matriz como en QuickSort. Pero para evitar casos extremos (p. Ej., Cuando se elige el elemento más pequeño como pivote en cada paso, de manera que el algoritmo degenera en tiempo de ejecución O (n ^ 2)), se aplica una selección de pivote especial, llamada algoritmo de mediana de medianas. La solución completa se ejecuta en O (n) en el peor de los casos y en el caso promedio.

Aquí está el enlace al artículo completo (se trata de encontrar Kth elemento más pequeño , pero el principio es el mismo para encontrar Kth más grande ):

Encontrar Kth Elemento más pequeño en una matriz sin clasificar

Aquí hay una implementación en C ++ de Selección Aleatoria Aleatoria. La idea es elegir al azar un elemento de pivote. Para implementar una partición aleatoria, usamos una función aleatoria, rand () para generar un índice entre l y r, intercambiamos el elemento en el índice generado aleatoriamente con el último elemento y finalmente llamamos al proceso de partición estándar que usa el último elemento como pivote.

#include<iostream> #include<climits> #include<cstdlib> using namespace std; int randomPartition(int arr[], int l, int r); // This function returns k''th smallest element in arr[l..r] using // QuickSort based method. ASSUMPTION: ALL ELEMENTS IN ARR[] ARE DISTINCT int kthSmallest(int arr[], int l, int r, int k) { // If k is smaller than number of elements in array if (k > 0 && k <= r - l + 1) { // Partition the array around a random element and // get position of pivot element in sorted array int pos = randomPartition(arr, l, r); // If position is same as k if (pos-l == k-1) return arr[pos]; if (pos-l > k-1) // If position is more, recur for left subarray return kthSmallest(arr, l, pos-1, k); // Else recur for right subarray return kthSmallest(arr, pos+1, r, k-pos+l-1); } // If k is more than number of elements in array return INT_MAX; } void swap(int *a, int *b) { int temp = *a; *a = *b; *b = temp; } // Standard partition process of QuickSort(). It considers the last // element as pivot and moves all smaller element to left of it and // greater elements to right. This function is used by randomPartition() int partition(int arr[], int l, int r) { int x = arr[r], i = l; for (int j = l; j <= r - 1; j++) { if (arr[j] <= x) //arr[i] is bigger than arr[j] so swap them { swap(&arr[i], &arr[j]); i++; } } swap(&arr[i], &arr[r]); // swap the pivot return i; } // Picks a random pivot element between l and r and partitions // arr[l..r] around the randomly picked element using partition() int randomPartition(int arr[], int l, int r) { int n = r-l+1; int pivot = rand() % n; swap(&arr[l + pivot], &arr[r]); return partition(arr, l, r); } // Driver program to test above methods int main() { int arr[] = {12, 3, 5, 7, 4, 19, 26}; int n = sizeof(arr)/sizeof(arr[0]), k = 3; cout << "K''th smallest element is " << kthSmallest(arr, 0, n-1, k); return 0; }

La complejidad en el peor de los casos de la solución anterior sigue siendo O (n2). En el peor de los casos, la función aleatoria siempre puede elegir un elemento de esquina. La complejidad de tiempo esperada de QuickSelect anterior al azar es Θ (n)

Encuentre la mediana de la matriz en tiempo lineal, luego use el procedimiento de partición exactamente como en el ordenamiento rápido para dividir la matriz en dos partes, los valores a la izquierda de la mediana menor (<) que a la mediana y a la derecha mayor que (>) mediana , eso también se puede hacer en tiempo lineal, ahora, vaya a esa parte de la matriz donde se encuentra el elemento kth, ahora la recurrencia se convierte en: T (n) = T (n / 2) + cn, que me da O (n) en general.

Esta es una implementación en Javascript.

Si libera la restricción de que no puede modificar la matriz, puede evitar el uso de memoria extra utilizando dos índices para identificar la "partición actual" (en el estilo clásico de quicksort - http://www.nczonline.net/blog/2012/11/27/computer-science-in-javascript-quicksort/ ).

Si quieres probar cómo funciona, puedes usar esta variación:

El resto del código es solo para crear un área de juegos:

function getRandomArray (n){ var ar = []; for (var i = 0, l = n; i < l; i++) { ar.push(Math.round(Math.random() * l)) } return ar; } //Create a random array of 50 numbers var ar = getRandomArray (50);

Ahora, ejecute las pruebas un par de veces. Debido a Math.random () producirá resultados cada vez diferentes:

kthMax(ar, 2, true); kthMax(ar, 2); kthMax(ar, 2); kthMax(ar, 2); kthMax(ar, 2); kthMax(ar, 2); kthMax(ar, 34, true); kthMax(ar, 34); kthMax(ar, 34); kthMax(ar, 34); kthMax(ar, 34); kthMax(ar, 34);

Si lo prueba varias veces, puede ver incluso empíricamente que el número de iteraciones es, en promedio, O (n) ~ = constante * n y el valor de k no afecta al algoritmo.

Implementé la búsqueda del mínimo kth en n elementos no clasificados utilizando programación dinámica, específicamente el método de torneo. El tiempo de ejecución es O (n + klog (n)). El mecanismo utilizado se enumera como uno de los métodos en la página de Wikipedia sobre el algoritmo de selección (como se indica en uno de los mensajes anteriores). Puedes leer sobre el algoritmo y también encontrar el código (java) en la página de mi blog Finding Kth Minimum . Además, la lógica puede realizar un orden parcial de la lista: devolver primero K min (o max) en tiempo O (klog (n)).

Aunque el código proporcionado como mínimo kth, se puede emplear una lógica similar para encontrar kth maximum en O (klog (n)), ignorando el trabajo previo realizado para crear el árbol de torneos.

La biblioteca estándar de C ++ tiene casi exactamente esa function llamada nth_element , aunque modifica sus datos. Ha esperado un tiempo de ejecución lineal, O (N), y también realiza una clasificación parcial.

const int N = ...; double a[N]; // ... const int m = ...; // m < N nth_element (a, a + m, a + N); // a[m] contains the mth element in a

La explicación del algoritmo de la mediana de las medianas para encontrar el k-th entero más grande de n se puede encontrar aquí: http://cs.indstate.edu/~spitla/presentation.pdf

La implementación en c ++ está abajo:

#include <iostream> #include <vector> #include <algorithm> using namespace std; int findMedian(vector<int> vec){ // Find median of a vector int median; size_t size = vec.size(); median = vec[(size/2)]; return median; } int findMedianOfMedians(vector<vector<int> > values){ vector<int> medians; for (int i = 0; i < values.size(); i++) { int m = findMedian(values[i]); medians.push_back(m); } return findMedian(medians); } void selectionByMedianOfMedians(const vector<int> values, int k){ // Divide the list into n/5 lists of 5 elements each vector<vector<int> > vec2D; int count = 0; while (count != values.size()) { int countRow = 0; vector<int> row; while ((countRow < 5) && (count < values.size())) { row.push_back(values[count]); count++; countRow++; } vec2D.push_back(row); } cout<<endl<<endl<<"Printing 2D vector : "<<endl; for (int i = 0; i < vec2D.size(); i++) { for (int j = 0; j < vec2D[i].size(); j++) { cout<<vec2D[i][j]<<" "; } cout<<endl; } cout<<endl; // Calculating a new pivot for making splits int m = findMedianOfMedians(vec2D); cout<<"Median of medians is : "<<m<<endl; // Partition the list into unique elements larger than ''m'' (call this sublist L1) and // those smaller them ''m'' (call this sublist L2) vector<int> L1, L2; for (int i = 0; i < vec2D.size(); i++) { for (int j = 0; j < vec2D[i].size(); j++) { if (vec2D[i][j] > m) { L1.push_back(vec2D[i][j]); }else if (vec2D[i][j] < m){ L2.push_back(vec2D[i][j]); } } } // Checking the splits as per the new pivot ''m'' cout<<endl<<"Printing L1 : "<<endl; for (int i = 0; i < L1.size(); i++) { cout<<L1[i]<<" "; } cout<<endl<<endl<<"Printing L2 : "<<endl; for (int i = 0; i < L2.size(); i++) { cout<<L2[i]<<" "; } // Recursive calls if ((k - 1) == L1.size()) { cout<<endl<<endl<<"Answer :"<<m; }else if (k <= L1.size()) { return selectionByMedianOfMedians(L1, k); }else if (k > (L1.size() + 1)){ return selectionByMedianOfMedians(L2, k-((int)L1.size())-1); } } int main() { int values[] = {2, 3, 5, 4, 1, 12, 11, 13, 16, 7, 8, 6, 10, 9, 17, 15, 19, 20, 18, 23, 21, 22, 25, 24, 14}; vector<int> vec(values, values + 25); cout<<"The given array is : "<<endl; for (int i = 0; i < vec.size(); i++) { cout<<vec[i]<<" "; } selectionByMedianOfMedians(vec, 8); return 0; }

Puede hacerlo en O (n + kn) = O (n) (para la constante k) para el tiempo y O (k) para el espacio, haciendo un seguimiento de los k elementos más grandes que ha visto.

Para cada elemento de la matriz, puede escanear la lista de k más grande y reemplazar el elemento más pequeño con el nuevo si es más grande.

Sin embargo, la solución del montón de prioridad de Warren es más ordenada.

Según este documento Al encontrar el artículo más grande en Kth en una lista de n artículos, el algoritmo siguiente llevará el tiempo O(n) en el peor de los casos.

Divida la matriz en n / 5 listas de 5 elementos cada una.
Encuentra la mediana en cada sub array de 5 elementos.
Encuentra recursivamente la mediana de todas las medianas, llamémosla M
Partición de la matriz en dos sub-matriz 1. La sub-matriz contiene los elementos más grandes que M, digamos que esta sub-matriz es a1, mientras que la otra sub-matriz contiene los elementos más pequeños que M, llamemos a esta sub-matriz a2.
Si k <= | a1 |, devuelva la selección (a1, k).
Si k− 1 = | a1 |, devuelva M.
Si k> | a1 | + 1, devuelve la selección (a2, k −a1 - 1).

Análisis: Como se sugiere en el artículo original:

Usamos la mediana para dividir la lista en dos mitades (la primera mitad, si k <= n/2 , y la segunda mitad de lo contrario). Este algoritmo toma el tiempo cn en el primer nivel de recursión para algunas constantes c , cn/2 en el siguiente nivel (ya que recursionamos en una lista de tamaño n / 2), cn/4 en el tercer nivel, y así sucesivamente. El tiempo total tomado es cn + cn/2 + cn/4 + .... = 2cn = o(n) .

¿Por qué el tamaño de partición se toma 5 y no 3?

Como se menciona en el papel original:

Dividir la lista por 5 asegura una división en el peor de los casos de 70 - 30. Por lo menos la mitad de las medianas mayor que la mediana de las medianas, por lo tanto, al menos la mitad de los bloques n / 5 tienen al menos 3 elementos y esto da un 3n/10 split, lo que significa que la otra partición es 7n / 10 en el peor de los casos. Eso da T(n) = T(n/5)+T(7n/10)+O(n). Since n/5+7n/10 < 1 T(n) = T(n/5)+T(7n/10)+O(n). Since n/5+7n/10 < 1 , el peor tiempo de ejecución es O(n) .

Ahora he intentado implementar el algoritmo anterior como:

public static int findKthLargestUsingMedian(Integer[] array, int k) { // Step 1: Divide the list into n/5 lists of 5 element each. int noOfRequiredLists = (int) Math.ceil(array.length / 5.0); // Step 2: Find pivotal element aka median of medians. int medianOfMedian = findMedianOfMedians(array, noOfRequiredLists); //Now we need two lists split using medianOfMedian as pivot. All elements in list listOne will be grater than medianOfMedian and listTwo will have elements lesser than medianOfMedian. List<Integer> listWithGreaterNumbers = new ArrayList<>(); // elements greater than medianOfMedian List<Integer> listWithSmallerNumbers = new ArrayList<>(); // elements less than medianOfMedian for (Integer element : array) { if (element < medianOfMedian) { listWithSmallerNumbers.add(element); } else if (element > medianOfMedian) { listWithGreaterNumbers.add(element); } } // Next step. if (k <= listWithGreaterNumbers.size()) return findKthLargestUsingMedian((Integer[]) listWithGreaterNumbers.toArray(new Integer[listWithGreaterNumbers.size()]), k); else if ((k - 1) == listWithGreaterNumbers.size()) return medianOfMedian; else if (k > (listWithGreaterNumbers.size() + 1)) return findKthLargestUsingMedian((Integer[]) listWithSmallerNumbers.toArray(new Integer[listWithSmallerNumbers.size()]), k-listWithGreaterNumbers.size()-1); return -1; } public static int findMedianOfMedians(Integer[] mainList, int noOfRequiredLists) { int[] medians = new int[noOfRequiredLists]; for (int count = 0; count < noOfRequiredLists; count++) { int startOfPartialArray = 5 * count; int endOfPartialArray = startOfPartialArray + 5; Integer[] partialArray = Arrays.copyOfRange((Integer[]) mainList, startOfPartialArray, endOfPartialArray); // Step 2: Find median of each of these sublists. int medianIndex = partialArray.length/2; medians[count] = partialArray[medianIndex]; } // Step 3: Find median of the medians. return medians[medians.length / 2]; }

Solo para completar, otro algoritmo utiliza la cola de prioridad y toma tiempo O(nlogn) .

public static int findKthLargestUsingPriorityQueue(Integer[] nums, int k) { int p = 0; int numElements = nums.length; // create priority queue where all the elements of nums will be stored PriorityQueue<Integer> pq = new PriorityQueue<Integer>(); // place all the elements of the array to this priority queue for (int n : nums) { pq.add(n); } // extract the kth largest element while (numElements - k + 1 > 0) { p = pq.poll(); k++; } return p; }

Ambos algoritmos pueden ser probados como:

public static void main(String[] args) throws IOException { Integer[] numbers = new Integer[]{2, 3, 5, 4, 1, 12, 11, 13, 16, 7, 8, 6, 10, 9, 17, 15, 19, 20, 18, 23, 21, 22, 25, 24, 14}; System.out.println(findKthLargestUsingMedian(numbers, 8)); System.out.println(findKthLargestUsingPriorityQueue(numbers, 8)); }

Como salida esperada es: 18 18

Selección rápida sexy en Python

def quickselect(arr, k): '''''' k = 1 returns first element in ascending order. can be easily modified to return first element in descending order '''''' r = random.randrange(0, len(arr)) a1 = [i for i in arr if i < arr[r]] ''''''partition'''''' a2 = [i for i in arr if i > arr[r]] if k <= len(a1): return quickselect(a1, k) elif k > len(arr)-len(a2): return quickselect(a2, k - (len(arr) - len(a2))) else: return arr[r]

Si desea un verdadero algoritmo O(n) , a diferencia de O(kn) o algo así, entonces debe usar quickselect (es básicamente una ordenación rápida donde tira la partición que no le interesa). Mi profesor tiene una gran reseña, con el análisis de tiempo de ejecución: ( reference )

El algoritmo de selección rápida encuentra rápidamente el k-ésimo elemento más pequeño de una matriz sin clasificar de n elementos. Es un RandomizedAlgorithm , por lo que calculamos el peor tiempo de ejecución esperado .

Aquí está el algoritmo.

QuickSelect(A, k) let r be chosen uniformly at random in the range 1 to length(A) let pivot = A[r] let A1, A2 be new arrays # split into a pile A1 of small elements and A2 of big elements for i = 1 to n if A[i] < pivot then append A[i] to A1 else if A[i] > pivot then append A[i] to A2 else # do nothing end for if k <= length(A1): # it''s in the pile of small elements return QuickSelect(A1, k) else if k > length(A) - length(A2) # it''s in the pile of big elements return QuickSelect(A2, k - (length(A) - length(A2)) else # it''s equal to the pivot return pivot

¿Cuál es el tiempo de ejecución de este algoritmo? Si el adversario lanza monedas para nosotros, podemos encontrar que el pivote es siempre el elemento más grande y k siempre es 1, lo que da un tiempo de ejecución de

T(n) = Theta(n) + T(n-1) = Theta(n²)

Pero si las opciones son aleatorias, el tiempo de ejecución esperado está dado por

T(n) <= Theta(n) + (1/n) ∑_{i=1 to n}T(max(i, n-i-1))

donde estamos asumiendo que no es del todo razonable suponer que la recursión siempre cae en el mayor de A1 o A2 .

Supongamos que T(n) <= an para algunos a . Entonces conseguimos

T(n) <= cn + (1/n) ∑_{i=1 to n}T(max(i-1, n-i)) = cn + (1/n) ∑_{i=1 to floor(n/2)} T(n-i) + (1/n) ∑_{i=floor(n/2)+1 to n} T(i) <= cn + 2 (1/n) ∑_{i=floor(n/2) to n} T(i) <= cn + 2 (1/n) ∑_{i=floor(n/2) to n} ai

y ahora de alguna manera tenemos que obtener la suma horrenda a la derecha del signo más para absorber el cn de la izquierda. Si simplemente lo unimos como 2(1/n) ∑ _{i=n/2 to n} an , obtenemos aproximadamente 2(1/n)(n/2)an = an . Pero esto es demasiado grande, no hay espacio para apretar en un cn adicional. Entonces expandamos la suma usando la fórmula de la serie aritmética:

∑_{i=floor(n/2) to n} i = ∑_{i=1 to n} i - ∑_{i=1 to floor(n/2)} i = n(n+1)/2 - floor(n/2)(floor(n/2)+1)/2 <= n²/2 - (n/4)²/2 = (15/32)n²

donde aprovechamos que n es "suficientemente grande" para reemplazar los factores del floor(n/2) feo floor(n/2) con el n/4 mucho más limpio (y más pequeño). Ahora podemos continuar con

cn + 2 (1/n) ∑_{i=floor(n/2) to n} ai, <= cn + (2a/n) (15/32) n² = n (c + (15/16)a) <= an

proporcionado a > 16c .

Esto da T(n) = O(n) . Claramente es Omega(n) , así que obtenemos T(n) = Theta(n) .

Solución Haskell:

kthElem index list = sort list !! index withShape ~[] [] = [] withShape ~(x:xs) (y:ys) = x : withShape xs ys sort [] = [] sort (x:xs) = (sort ls `withShape` ls) ++ [x] ++ (sort rs `withShape` rs) where ls = filter (< x) rs = filter (>= x)

Esto implementa la mediana de soluciones medianas utilizando el método withShape para descubrir el tamaño de una partición sin realmente calcularla.

También está el algoritmo de selección de Wirth , que tiene una implementación más sencilla que QuickSelect. El algoritmo de selección de Wirth es más lento que el QuickSelect, pero con algunas mejoras se vuelve más rápido.

Con más detalle. Usando la optimización MODIFIND de Vladimir Zabrodsky y la selección de pivote de la mediana de 3 y prestando atención a los pasos finales de la parte de partición del algoritmo, se me ocurrió el siguiente algoritmo (posiblemente llamado "LefSelect"):

#define F_SWAP(a,b) { float temp=(a);(a)=(b);(b)=temp; } # Note: The code needs more than 2 elements to work float lefselect(float a[], const int n, const int k) { int l=0, m = n-1, i=l, j=m; float x; while (l<m) { if( a[k] < a[i] ) F_SWAP(a[i],a[k]); if( a[j] < a[i] ) F_SWAP(a[i],a[j]); if( a[j] < a[k] ) F_SWAP(a[k],a[j]); x=a[k]; while (j>k & i<k) { do i++; while (a[i]<x); do j--; while (a[j]>x); F_SWAP(a[i],a[j]); } i++; j--; if (j<k) { while (a[i]<x) i++; l=i; j=m; } if (k<i) { while (x<a[j]) j--; m=j; i=l; } } return a[k]; }

En los puntos de referencia que hice here , LefSelect es un 20-30% más rápido que QuickSelect.

Te gusta quicksort Elige un elemento al azar y empuja todo hacia arriba o hacia abajo. En este punto, sabrá qué elemento seleccionó realmente, y si es el elemento kth que ha terminado, de lo contrario, repita con el contenedor (superior o inferior), que el elemento kth caerá. Estadísticamente hablando, la hora se necesita para encontrar que el elemento kth crece con n, O (n).

Un rápido Google en ese (''kth array de elementos más grande'') devolvió esto: http://discuss.joelonsoftware.com/default.asp?interview.11.509587.17

"Make one pass through tracking the three largest values so far."

(fue específicamente para 3d más grande)

y esta respuesta:

Build a heap/priority queue. O(n) Pop top element. O(log n) Pop top element. O(log n) Pop top element. O(log n) Total = O(n) + 3 O(log n) = O(n)

iterar a través de la lista. Si el valor actual es más grande que el valor más grande almacenado, guárdelo como el valor más grande y elimine el 1-4 hacia abajo y 5 caen de la lista. Si no, compárelo con el número 2 y haga lo mismo. Repita, comparándolo con los 5 valores almacenados. Esto debería hacerlo en O (n).

me gustaría sugerir una respuesta

Si tomamos los primeros k elementos y los ordenamos en una lista enlazada de k valores

ahora para cualquier otro valor, incluso para el peor de los casos, si hacemos una ordenación por inserción para los valores nk en reposo, incluso en el peor de los casos el número de comparaciones será k * (nk) y para los valores previos k se ordenen que sea k * (k- 1) por lo que resulta ser (nk-k) que es o (n)

aclamaciones

El análisis del algoritmo de un programador proporciona una versión que es O (n), aunque el autor afirma que el factor constante es tan alto que probablemente preferiría el método ingenuo de ordenar, lista y luego seleccionar.

Respondí la carta de tu pregunta :)

Aunque no está muy seguro de la complejidad de O (n), pero seguro que estará entre O (n) y nLog (n). También asegúrese de estar más cerca de O (n) que nLog (n). La función está escrita en Java.

public int quickSelect(ArrayList<Integer>list, int nthSmallest){ //Choose random number in range of 0 to array length Random random = new Random(); //This will give random number which is not greater than length - 1 int pivotIndex = random.nextInt(list.size() - 1); int pivot = list.get(pivotIndex); ArrayList<Integer> smallerNumberList = new ArrayList<Integer>(); ArrayList<Integer> greaterNumberList = new ArrayList<Integer>(); //Split list into two. //Value smaller than pivot should go to smallerNumberList //Value greater than pivot should go to greaterNumberList //Do nothing for value which is equal to pivot for(int i=0; i<list.size(); i++){ if(list.get(i)<pivot){ smallerNumberList.add(list.get(i)); } else if(list.get(i)>pivot){ greaterNumberList.add(list.get(i)); } else{ //Do nothing } } //If smallerNumberList size is greater than nthSmallest value, nthSmallest number must be in this list if(nthSmallest < smallerNumberList.size()){ return quickSelect(smallerNumberList, nthSmallest); } //If nthSmallest is greater than [ list.size() - greaterNumberList.size() ], nthSmallest number must be in this list //The step is bit tricky. If confusing, please see the above loop once again for clarification. else if(nthSmallest > (list.size() - greaterNumberList.size())){ //nthSmallest will have to be changed here. [ list.size() - greaterNumberList.size() ] elements are already in //smallerNumberList nthSmallest = nthSmallest - (list.size() - greaterNumberList.size()); return quickSelect(greaterNumberList,nthSmallest); } else{ return pivot; } }

Aquí está la implementación del algoritmo eladv sugerido (también pongo aquí la implementación con pivote aleatorio):

public class Median { public static void main(String[] s) { int[] test = {4,18,20,3,7,13,5,8,2,1,15,17,25,30,16}; System.out.println(selectK(test,8)); /* int n = 100000000; int[] test = new int[n]; for(int i=0; i<test.length; i++) test[i] = (int)(Math.random()*test.length); long start = System.currentTimeMillis(); random_selectK(test, test.length/2); long end = System.currentTimeMillis(); System.out.println(end - start); */ } public static int random_selectK(int[] a, int k) { if(a.length <= 1) return a[0]; int r = (int)(Math.random() * a.length); int p = a[r]; int small = 0, equal = 0, big = 0; for(int i=0; i<a.length; i++) { if(a[i] < p) small++; else if(a[i] == p) equal++; else if(a[i] > p) big++; } if(k <= small) { int[] temp = new int[small]; for(int i=0, j=0; i<a.length; i++) if(a[i] < p) temp[j++] = a[i]; return random_selectK(temp, k); } else if (k <= small+equal) return p; else { int[] temp = new int[big]; for(int i=0, j=0; i<a.length; i++) if(a[i] > p) temp[j++] = a[i]; return random_selectK(temp,k-small-equal); } } public static int selectK(int[] a, int k) { if(a.length <= 5) { Arrays.sort(a); return a[k-1]; } int p = median_of_medians(a); int small = 0, equal = 0, big = 0; for(int i=0; i<a.length; i++) { if(a[i] < p) small++; else if(a[i] == p) equal++; else if(a[i] > p) big++; } if(k <= small) { int[] temp = new int[small]; for(int i=0, j=0; i<a.length; i++) if(a[i] < p) temp[j++] = a[i]; return selectK(temp, k); } else if (k <= small+equal) return p; else { int[] temp = new int[big]; for(int i=0, j=0; i<a.length; i++) if(a[i] > p) temp[j++] = a[i]; return selectK(temp,k-small-equal); } } private static int median_of_medians(int[] a) { int[] b = new int[a.length/5]; int[] temp = new int[5]; for(int i=0; i<b.length; i++) { for(int j=0; j<5; j++) temp[j] = a[5*i + j]; Arrays.sort(temp); b[i] = temp[2]; } return selectK(b, b.length/2 + 1); } }

Esto se llama encontrar la estadística de orden k-th . Hay un algoritmo aleatorio muy simple (llamado selección rápida ) que toma el tiempo promedio de O(n^2) , el tiempo de peor caso O(n^2) , y un algoritmo no aleatorio bastante complicado (llamado introselect ) que toma el tiempo de peor caso O(n) . Hay algo de información en Wikipedia , pero no es muy buena.

~~Todo lo que necesitas está en estas diapositivas de PowerPoint~~ . Solo para extraer el algoritmo básico del algoritmo de peor caso O(n) (introselect):

Select(A,n,i): Divide input into ⌈n/5⌉ groups of size 5. /* Partition on median-of-medians */ medians = array of each group’s median. pivot = Select(medians, ⌈n/5⌉, ⌈n/10⌉) Left Array L and Right Array G = partition(A, pivot) /* Find ith element in L, pivot, or G */ k = |L| + 1 If i = k, return pivot If i < k, return Select(L, k-1, i) If i > k, return Select(G, n-k, i-k)

También está muy bien detallado en el libro Introducción a los algoritmos de Cormen et al.

Ir al final de este enlace: ...........

http://www.geeksforgeeks.org/kth-smallestlargest-element-unsorted-array-set-3-worst-case-linear-time/

Lo que haría es esto:

initialize empty doubly linked list l for each element e in array if e larger than head(l) make e the new head of l if size(l) > k remove last element from l the last element of l should now be the kth largest element

Simplemente puede almacenar punteros al primer y último elemento de la lista enlazada. Solo cambian cuando se hacen actualizaciones a la lista.

Actualizar:

initialize empty sorted tree l for each element e in array if e between head(l) and tail(l) insert e into l // O(log k) if size(l) > k remove last element from l the last element of l should now be the kth largest element

Primero, podemos construir una BST a partir de una matriz no clasificada que demore O (n) y, a partir de la BST, podemos encontrar el kth elemento más pequeño en O (log (n)) que, en general, cuenta con un orden de O (n).

Se me ocurrió este algoritmo y parece ser O (n):

Digamos que k = 3 y queremos encontrar el tercer elemento más grande en la matriz. Me gustaría crear tres variables y comparar cada elemento de la matriz con el mínimo de estas tres variables. Si el elemento de la matriz es mayor que nuestro mínimo, reemplazaremos la variable min con el valor del elemento. Continuamos lo mismo hasta el final de la matriz. El mínimo de nuestras tres variables es el tercer elemento más grande de la matriz.

define variables a=0, b=0, c=0 iterate through the array items find minimum a,b,c if item > min then replace the min variable with item value continue until end of array the minimum of a,b,c is our answer

Y, para encontrar el ítem más grande de Kth necesitamos K variables.

Ejemplo: (k = 3)

[1,2,4,1,7,3,9,5,6,2,9,8] Final variable values: a=7 (answer) b=8 c=9

¿Alguien puede revisar esto y hacerme saber lo que me estoy perdiendo?

es similar a la estrategia quickSort, donde elegimos un pivote arbitrario, y traemos los elementos más pequeños a su izquierda, y el más grande a la derecha

public static int kthElInUnsortedList(List<int> list, int k) { if (list.Count == 1) return list[0]; List<int> left = new List<int>(); List<int> right = new List<int>(); int pivotIndex = list.Count / 2; int pivot = list[pivotIndex]; //arbitrary for (int i = 0; i < list.Count && i != pivotIndex; i++) { int currentEl = list[i]; if (currentEl < pivot) left.Add(currentEl); else right.Add(currentEl); } if (k == left.Count + 1) return pivot; if (left.Count < k) return kthElInUnsortedList(right, k - left.Count - 1); else return kthElInUnsortedList(left, k); }

Puede encontrar el kth elemento más pequeño en tiempo O (n) y espacio constante. Si consideramos que la matriz es solo para enteros.

El enfoque es hacer una búsqueda binaria en el rango de valores de Array. Si tenemos un min_value y un max_value en un rango entero, podemos hacer una búsqueda binaria en ese rango. Podemos escribir una función de comparación que nos dirá si algún valor es el kth-más pequeño o más pequeño que kth-más pequeño o más grande que kth-más pequeño. Haga la búsqueda binaria hasta que alcance el número kth-más pequeño

Aquí está el código para eso

Solución de clase:

def _iskthsmallest(self, A, val, k): less_count, equal_count = 0, 0 for i in range(len(A)): if A[i] == val: equal_count += 1 if A[i] < val: less_count += 1 if less_count >= k: return 1 if less_count + equal_count < k: return -1 return 0 def kthsmallest_binary(self, A, min_val, max_val, k): if min_val == max_val: return min_val mid = (min_val + max_val)/2 iskthsmallest = self._iskthsmallest(A, mid, k) if iskthsmallest == 0: return mid if iskthsmallest > 0: return self.kthsmallest_binary(A, min_val, mid, k) return self.kthsmallest_binary(A, mid+1, max_val, k) # @param A : tuple of integers # @param B : integer # @return an integer def kthsmallest(self, A, k): if not A: return 0 if k > len(A): return 0 min_val, max_val = min(A), max(A) return self.kthsmallest_binary(A, min_val, max_val, k)

También hay un algoritmo, que supera el algoritmo de selección rápida. Se llama algoritmo Floyd-Rivets (FR) .

Artículo original: https://doi.org/10.1145/360680.360694

Versión descargable: http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.309.7108&rep=rep1&type=pdf

Artículo de Wikipedia https://en.wikipedia.org/wiki/Floyd%E2%80%93Rivest_algorithm

Intenté implementar quickselect y el algoritmo FR en C ++. También los comparé con las implementaciones estándar de la biblioteca de C ++ std :: nth_element (que es básicamente un híbrido introselect de quickselect y heapselect). El resultado fue quickselect y nth_element funcionó de manera comparable en promedio, pero el algoritmo FR se ejecutó aprox. El doble de rápido en comparación con ellos.

Código de muestra que utilicé para el algoritmo FR:

template <typename T> T FRselect(std::vector<T>& data, const size_t& n) { if (n == 0) return *(std::min_element(data.begin(), data.end())); else if (n == data.size() - 1) return *(std::max_element(data.begin(), data.end())); else return _FRselect(data, 0, data.size() - 1, n); } template <typename T> T _FRselect(std::vector<T>& data, const size_t& left, const size_t& right, const size_t& n) { size_t leftIdx = left; size_t rightIdx = right; while (rightIdx > leftIdx) { if (rightIdx - leftIdx > 600) { size_t range = rightIdx - leftIdx + 1; long long i = n - (long long)leftIdx + 1; long long z = log(range); long long s = 0.5 * exp(2 * z / 3); long long sd = 0.5 * sqrt(z * s * (range - s) / range) * sgn(i - (long long)range / 2); size_t newLeft = fmax(leftIdx, n - i * s / range + sd); size_t newRight = fmin(rightIdx, n + (range - i) * s / range + sd); _FRselect(data, newLeft, newRight, n); } T t = data[n]; size_t i = leftIdx; size_t j = rightIdx; // arrange pivot and right index std::swap(data[leftIdx], data[n]); if (data[rightIdx] > t) std::swap(data[rightIdx], data[leftIdx]); while (i < j) { std::swap(data[i], data[j]); ++i; --j; while (data[i] < t) ++i; while (data[j] > t) --j; } if (data[leftIdx] == t) std::swap(data[leftIdx], data[j]); else { ++j; std::swap(data[j], data[rightIdx]); } // adjust left and right towards the boundaries of the subset // containing the (k - left + 1)th smallest element if (j <= n) leftIdx = j + 1; if (n <= j) rightIdx = j - 1; } return data[leftIdx]; } template <typename T> int sgn(T val) { return (T(0) < val) - (val < T(0)); }