sort example array java arrays list sorting timing

java - example - ¿Por qué Collections.sort() es mucho más lento que Arrays.sort()?



comparator java example (5)

Intenté hacer una prueba con respecto a Collection.sort() y Arrays.sort() . En la prueba, creé una matriz de int s de longitud 1e5 100 veces, que contenía números aleatorios del 1 al 1e5. También creé una lista de tipo Integer , que contenía los mismos valores en las mismas posiciones que la de la matriz. Luego Arrays.sort() la matriz usando Arrays.sort() y la lista usando Collections.sort() .

ACTUALIZACIÓN: Como @Holger señaló, mi código tenía un error. El código corregido es ahora:

import java.util.* ; class TestClass { public static void main(String args[] ) throws Exception { double ratSum = 0 ; for(int j=0;j<100;j++) { int[] A = new int[(int)1e5] ; List<Integer> L = new ArrayList<Integer>() ; for(int i=0;i<A.length;i++) { int no = (int)(Math.random()*(int)1e5) ; A[i] = no ; L.add(A[i]) ; } long startTime = System.nanoTime() ; Arrays.sort(A) ; long endTime = System.nanoTime() ; Collections.sort(L) ; long endTime2 = System.nanoTime() ; long t1 = (endTime-startTime), t2 = (endTime2-endTime) ; ratSum+=(double)t2/t1 ; System.out.println("Arrays.sort took :"+t1+" Collections.sort took :"+t2+" ratio :"+((double)t2/t1)) ; } System.out.println("Average ratio :"+(ratSum/100)) ; } }

Y la salida es:

Arrays.sort took :24106021 Collections.sort took :92353602 ratio :3.8311425182944956 Arrays.sort took :8672831 Collections.sort took :50936497 ratio :5.873110752417521 Arrays.sort took :8561227 Collections.sort took :25611480 ratio :2.991566512603859 Arrays.sort took :7123928 Collections.sort took :17368785 ratio :2.4380910362934607 Arrays.sort took :6280488 Collections.sort took :16929218 ratio :2.6955258890710403 Arrays.sort took :6248227 Collections.sort took :16844915 ratio :2.695951187432851 Arrays.sort took :6220942 Collections.sort took :16979669 ratio :2.7294369566538315 Arrays.sort took :6213841 Collections.sort took :17439817 ratio :2.8066081832476883 Arrays.sort took :6286385 Collections.sort took :19963612 ratio :3.175690321225951 Arrays.sort took :6209668 Collections.sort took :17008307 ratio :2.7390042430609816 Arrays.sort took :6286623 Collections.sort took :17007163 ratio :2.705293923303497 Arrays.sort took :6256505 Collections.sort took :16911950 ratio :2.703098614961548 Arrays.sort took :6225031 Collections.sort took :16914494 ratio :2.7171742598550916 Arrays.sort took :6233918 Collections.sort took :17005995 ratio :2.72797861633727 Arrays.sort took :6210554 Collections.sort took :17606028 ratio :2.834856278522013 Arrays.sort took :6239384 Collections.sort took :20342378 ratio :3.260318326296314 Arrays.sort took :6207695 Collections.sort took :16519089 ratio :2.6610664666997974 Arrays.sort took :6227147 Collections.sort took :16605884 ratio :2.666692146499834 Arrays.sort took :6225187 Collections.sort took :16687597 ratio :2.680657946500242 Arrays.sort took :6152338 Collections.sort took :16475373 ratio :2.6779043999208105 Arrays.sort took :6184746 Collections.sort took :16511024 ratio :2.6696365541931715 Arrays.sort took :6130221 Collections.sort took :16578032 ratio :2.7043122915144493 Arrays.sort took :6271927 Collections.sort took :16507152 ratio :2.631910734930429 Arrays.sort took :6232482 Collections.sort took :16562166 ratio :2.657394919070765 Arrays.sort took :6218992 Collections.sort took :16552468 ratio :2.661599821964717 Arrays.sort took :6230427 Collections.sort took :21954967 ratio :3.52383022865046 Arrays.sort took :8204666 Collections.sort took :16607560 ratio :2.024160398485447 Arrays.sort took :6272619 Collections.sort took :22061291 ratio :3.5170781136236715 Arrays.sort took :8618253 Collections.sort took :19979549 ratio :2.3182829513127543 Arrays.sort took :6198538 Collections.sort took :17002645 ratio :2.743008915973412 Arrays.sort took :6265018 Collections.sort took :17079646 ratio :2.7261926462142645 Arrays.sort took :6302335 Collections.sort took :17040082 ratio :2.7037728080148073 Arrays.sort took :6293948 Collections.sort took :17133482 ratio :2.722215372608735 Arrays.sort took :6272364 Collections.sort took :17099717 ratio :2.7261997231028046 Arrays.sort took :6219540 Collections.sort took :17026849 ratio :2.737637992520347 Arrays.sort took :6231000 Collections.sort took :17149439 ratio :2.7522771625742255 Arrays.sort took :6309215 Collections.sort took :17118779 ratio :2.713297771592821 Arrays.sort took :6200511 Collections.sort took :17123517 ratio :2.7616299688848227 Arrays.sort took :6263169 Collections.sort took :16995685 ratio :2.7135919532109063 Arrays.sort took :6212243 Collections.sort took :17101848 ratio :2.7529264389689843 Arrays.sort took :6247580 Collections.sort took :17089850 ratio :2.735435160494143 Arrays.sort took :6283626 Collections.sort took :17088109 ratio :2.7194662763188004 Arrays.sort took :6312678 Collections.sort took :17055856 ratio :2.7018415955954036 Arrays.sort took :6222695 Collections.sort took :17071263 ratio :2.7433873908330715 Arrays.sort took :6300990 Collections.sort took :17016171 ratio :2.7005551508572463 Arrays.sort took :6262923 Collections.sort took :17084477 ratio :2.727875945465081 Arrays.sort took :6256482 Collections.sort took :17062232 ratio :2.7271287602202006 Arrays.sort took :6259643 Collections.sort took :17036036 ratio :2.721566709155778 Arrays.sort took :6248649 Collections.sort took :16944960 ratio :2.711779778316881 Arrays.sort took :6264515 Collections.sort took :16986876 ratio :2.7116027338109974 Arrays.sort took :6241864 Collections.sort took :17367903 ratio :2.782486609769133 Arrays.sort took :6297429 Collections.sort took :17080086 ratio :2.7122316107097038 Arrays.sort took :6184084 Collections.sort took :17584862 ratio :2.843567778186713 Arrays.sort took :6315776 Collections.sort took :22279278 ratio :3.5275598754610678 Arrays.sort took :6253047 Collections.sort took :17091694 ratio :2.7333384828228544 Arrays.sort took :6291188 Collections.sort took :17147694 ratio :2.725668665441249 Arrays.sort took :6327348 Collections.sort took :17034007 ratio :2.6921242517402235 Arrays.sort took :6284904 Collections.sort took :17049315 ratio :2.712740719667317 Arrays.sort took :6190436 Collections.sort took :17143853 ratio :2.7694096183209065 Arrays.sort took :6301712 Collections.sort took :17070237 ratio :2.7088253160411013 Arrays.sort took :6208193 Collections.sort took :17060372 ratio :2.74804149935416 Arrays.sort took :6247700 Collections.sort took :16961962 ratio :2.7149130079869392 Arrays.sort took :6344996 Collections.sort took :17084627 ratio :2.6926143058246215 Arrays.sort took :6214232 Collections.sort took :17150324 ratio :2.759846108095095 Arrays.sort took :6224359 Collections.sort took :17081254 ratio :2.744259127727048 Arrays.sort took :6256722 Collections.sort took :17005451 ratio :2.7179489515436357 Arrays.sort took :6286439 Collections.sort took :17061112 ratio :2.713954911516679 Arrays.sort took :6250634 Collections.sort took :17091313 ratio :2.7343327092899696 Arrays.sort took :6252900 Collections.sort took :17041659 ratio :2.7254008540037424 Arrays.sort took :6222192 Collections.sort took :17125062 ratio :2.75225547524088 Arrays.sort took :6227037 Collections.sort took :17013314 ratio :2.7321684454420296 Arrays.sort took :6223609 Collections.sort took :17086112 ratio :2.745370411283871 Arrays.sort took :6280777 Collections.sort took :17091821 ratio :2.7212908530266238 Arrays.sort took :6254551 Collections.sort took :17148242 ratio :2.741722307484582 Arrays.sort took :6250927 Collections.sort took :17053331 ratio :2.7281283240069834 Arrays.sort took :6270616 Collections.sort took :17067948 ratio :2.721893351466586 Arrays.sort took :6223093 Collections.sort took :17034584 ratio :2.737317922132933 Arrays.sort took :6286002 Collections.sort took :17128280 ratio :2.7248289135129133 Arrays.sort took :6239485 Collections.sort took :17032062 ratio :2.7297224049741287 Arrays.sort took :6191290 Collections.sort took :17017219 ratio :2.748574045150526 Arrays.sort took :6134110 Collections.sort took :17069485 ratio :2.782715830006309 Arrays.sort took :6207363 Collections.sort took :17052862 ratio :2.747199092432648 Arrays.sort took :6238702 Collections.sort took :17056945 ratio :2.734053493819708 Arrays.sort took :6185356 Collections.sort took :17006088 ratio :2.749411351585907 Arrays.sort took :6309226 Collections.sort took :17056503 ratio :2.703422416632405 Arrays.sort took :6256706 Collections.sort took :17082903 ratio :2.7303349398229675 Arrays.sort took :6194988 Collections.sort took :17069426 ratio :2.7553606237816766 Arrays.sort took :6184266 Collections.sort took :17054641 ratio :2.757746998592881 Arrays.sort took :6271022 Collections.sort took :17086036 ratio :2.724601508334686 Arrays.sort took :6246482 Collections.sort took :17077804 ratio :2.733987546910405 Arrays.sort took :6194985 Collections.sort took :17119911 ratio :2.763511291794895 Arrays.sort took :6319199 Collections.sort took :17444587 ratio :2.760569337980969 Arrays.sort took :6262827 Collections.sort took :17065589 ratio :2.7249018693954024 Arrays.sort took :6301245 Collections.sort took :17195611 ratio :2.728922776371971 Arrays.sort took :6214333 Collections.sort took :17024645 ratio :2.739577199998777 Arrays.sort took :6213116 Collections.sort took :17382033 ratio :2.7976353572024086 Arrays.sort took :6286394 Collections.sort took :17124874 ratio :2.7241171965995132 Arrays.sort took :6166308 Collections.sort took :16998293 ratio :2.756640278104824 Arrays.sort took :6247395 Collections.sort took :16957056 ratio :2.7142602636779007 Arrays.sort took :6245054 Collections.sort took :16994147 ratio :2.72121698227109 Average ratio :2.792654880602193

Además, ejecuté el código localmente, 1000 veces (en lugar de 100) y la proporción promedio fue de:: 3.0616 Por lo tanto, la proporción aún es significativa y, por lo tanto, digna de discusión

Pregunta: ¿Por qué Collections.sort() tarda aproximadamente 3 veces del tiempo que toma Arrays.sort() para ordenar los mismos valores? ¿Es porque ahora no estamos comparando primitivos? ¿Por qué tomaría más tiempo?


Collection.sort () utilizó el algoritmo de clasificación de mezcla y Arrays.sort () utiliza la clasificación rápida. La ordenación rápida tiene grandes inconvenientes cuando se trata de combinar la ordenación, no es estable y no primitiva. Por lo tanto, dependiendo del requisito, usaremos Arrays.sort () o Collection.sort (), la necesidad del clima para comparar objetos o primitivos.


Entonces, hay dos métodos diferentes con algoritmos totalmente diferentes aquí:

Arrays.sort(int[]) utiliza un algoritmo de Arrays.sort(int[]) doble pivote.

Collections.sort(List<T>) llama a list.sort(null) que a su vez llama a Arrays.sort(T[]) . Esto utiliza un algoritmo Timsort.

Entonces, comparemos Arrays.sort(int[]) y Arrays.sort(T[]) .

  1. T[] es una matriz en caja, por lo que hay un nivel adicional de direccionamiento indirecto: para cada llamada, tienes que desenvolver Integer. Esto sin duda conduce a una sobrecarga. Por otro lado, int[] es una matriz primitiva por lo que todos los elementos están disponibles "de inmediato".
  2. TimSort es una variación de un algoritmo de combinación clásico. Es más rápido que mergesort pero sigue siendo más lento que quicksort porque
    • quicksort tiene menos movimientos de datos en datos aleatorios
    • quicksort requiere O(log(n)) espacio adicional, mientras que TimSort requiere O(n) para proporcionar estabilidad, lo que también conduce a una sobrecarga.

Hay dos problemas aquí:

Problema # 1:

Bajo las cubiertas, Collections.sort funciona copiando la colección a una matriz, clasificando la matriz y luego copiando la matriz de nuevo a la colección.

Arrays.sort simplemente ordena la matriz en su lugar.

Ahora, para una matriz / colección lo suficientemente grande, el costo de clasificación ( O(NlogN )) dominará el costo de la copia ( O(N) ). Para una pequeña matriz / colección, la copia se vuelve significativa.

(Este comportamiento puede depender del tipo de colección. Para una ArrayList , la implementación de Collections.sort puede ordenar la matriz de respaldo sin copiar datos. Necesitaría verificar el código fuente. ACTUALIZACIÓN : clasificación in situ confirmada para ArrayList para Java 8 y posteriores.)

Problema # 2:

Está comparando la clasificación de un int[] con la clasificación de una List<Integer> .

Esto es manzanas y naranjas. Porque:

  1. La comparación de dos valores int mediante operadores relacionales es más rápida que la comparación de dos valores Integer mediante compareTo(Integer) .
  2. Arrays.sort(int[]) utiliza un algoritmo diferente (más rápido) al que usa Arrays.sort(Object[])

Si desea una comparación más justa, compare Collections.sort en un ArrayList<Integer> con Arrays.sort(Object[]) en un Integer[] .


Hay una cosa que no se mencionó en el camino, que es la "persecución de punteros", que está relacionada con la parte "unboxing". Para una matriz de este tamaño pequeño, ya sea que use timsort o quicksort no debería hacer una diferencia significativa (para matrices primitivas con velocidades de CPU actuales, esto no es lo que mata su velocidad).

Si bien el boxeo no ocurre fuera de la inicialización en su ejemplo, la gran diferencia ocurre cuando se leen los datos.

Como los ints son primitivos, un int [] es solo una parte contigua de la memoria que contiene los datos en sí, un Integer [] es una parte contigua de la memoria que contiene referencias (es decir, punteros) a los objetos de datos individuales y los propios objetos Integer pueden estar esparcidos por toda la memoria.

Por lo tanto, para una operación de ordenación en un int [], la CPU obtendrá una porción de memoria y puede operar directamente en ella. Pero para un Entero [], la CPU debe perseguir el puntero para cada objeto individual y obtenerlo de la memoria, antes de poder compararlo y luego operar en esa porción de memoria que es la matriz y reorganizar eso. Esto se llama "persecución de puntero".

No es tanto que el Entero [] requiera más operaciones para cada parte de los datos, como leer un valor, agregar una longitud de encabezado a la dirección base y obtener un valor desde allí (la CPU aplica muy bien estas instrucciones y eso esconde mucho de su impacto), es la latencia de la memoria que te mata. La obtención de cada objeto Integer individual desde una ubicación de memoria aleatoria hace casi toda la diferencia.

Por lo general, esto no es un gran problema, ya que normalmente se inicializa una cantidad bastante pequeña de Integer [] en un bucle cerrado y no hay mucho que hacer en segundo plano, por lo que es probable que los objetos Integer que se encuentran cerca de la memoria se puedan recuperar. en la memoria caché y se accede desde allí con bastante rapidez, pero para las enormes matrices y listas que se crean y modifican en una aplicación ocupada, esto puede hacer una diferencia significativa y puede venir con picos de latencia inesperados. Usted querrá evitar eso si necesita latencias bajas y confiables. Sin embargo, para una gran cantidad de aplicaciones, si una ordenación demora unos milisegundos más, nadie se da cuenta.

[EDITAR]

Como lo solicitó en el comentario, aquí hay un código para demostrar que no se trata de timsort vs quicksort:

import java.util.Arrays; import java.util.Random; public class Pointerchasing1 { public static void main(String[] args) { //use the exact same algorithm implementation (insertionSort), to show that slowness is not caused by timsort vs quicksort. //expect that the object-version is slower. final int[] direct = new int[1024]; final Integer[] refs = new Integer[direct.length]; final Random rnd = new Random(0); for (int t = 0; t < 1000; ++t) { Arrays.setAll(direct, index -> rnd.nextInt()); Arrays.setAll(refs, index -> direct[index]); // boxing happens here //measure direct: long t1 = System.nanoTime(); insertionSortPrimitive(direct); long e1 = System.nanoTime()-t1; //measure refs: long t2 = System.nanoTime(); insertionSortObjects(refs); long e2 = System.nanoTime()-t2; // use results, so compiler can''t eliminate the loops System.out.println(Arrays.toString(direct)); System.out.println(Arrays.toString(refs)); System.out.println("-"); System.out.println(e1); System.out.println(e2); System.out.println("--"); } } private static void insertionSortPrimitive(final int[] arr) { int i, key, j; for (i = 1; i < arr.length; i++) { key = arr[i]; j = i - 1; while (j >= 0 && arr[j] > key) { arr[j + 1] = arr[j]; j = j - 1; } arr[j + 1] = key; } } private static void insertionSortObjects(final Integer[] arr) { int i, key, j; for (i = 1; i < arr.length; i++) { key = arr[i]; j = i - 1; while (j >= 0 && arr[j] > key) { arr[j + 1] = arr[j]; j = j - 1; } arr[j + 1] = key; } } }

Esta "prueba" deja unboxing como posible culpable.

[EDIT2]

Ahora, esta prueba es para mostrar que "unboxing" no es el problema. Unboxing es simplemente agregar unos cuantos bytes de encabezado de objetos a la dirección (la ejecución fuera de orden y la canalización hacen que el costo casi desaparezca) y obtener el valor de esa ubicación. En esta prueba utilizo dos matrices primitivas, una para una referencia y otra para un valor. Entonces cada acceso es indirecto. Esto es muy parecido a desempaquetar, solo sin agregar algunos bytes adicionales para el encabezado del objeto. La principal diferencia es que la versión "indirecta" no necesita perseguir el puntero para cada valor en el montón, puede cargar ambas matrices e indexar desde la matriz refs a la matriz de valores.

Si la búsqueda de punteros hace la diferencia, en lugar de desempaquetar, entonces la versión indirecta debería ser más rápida que la versión de objetos que lo hace.

import java.util.Arrays; import java.util.Random; public class Pointerchasing2 { public static void main(String[] args) { // use indirect access (like unboxing, but just chasing a single array pointer) vs. Integer objects (chasing every object''s pointer). // expect that the object-version is still slower. final int[] values = new int[1024]; final int[] refs = new int[1024]; final Integer[] objects = new Integer[values.length]; final Random rnd = new Random(0); for (int t = 0; t < 1000; ++t) { Arrays.setAll(values, index -> rnd.nextInt()); Arrays.setAll(refs, index -> index); Arrays.setAll(objects, index -> values[index]); // boxing happens here // measure indirect: long t1 = System.nanoTime(); insertionSortPrimitiveIndirect(refs, values); long e1 = System.nanoTime() - t1; // measure objects: long t2 = System.nanoTime(); insertionSortObjects(objects); long e2 = System.nanoTime() - t2; // use results, so compiler can''t eliminate the loops System.out.println(Arrays.toString(indirectResult(refs, values))); System.out.println(Arrays.toString(objects)); System.out.println("-"); System.out.println(e1); System.out.println(e2); System.out.println("--"); } } private static void insertionSortPrimitiveIndirect(final int[] refs, int[] values) { int i, keyIndex, j; for (i = 1; i < refs.length; i++) { keyIndex = refs[i]; j = i - 1; while (j >= 0 && values[refs[j]] > values[keyIndex]) { refs[j + 1] = refs[j]; j = j - 1; } refs[j + 1] = keyIndex; } } private static void insertionSortObjects(final Integer[] arr) { int i, key, j; for (i = 1; i < arr.length; i++) { key = arr[i]; j = i - 1; while (j >= 0 && arr[j] > key) { arr[j + 1] = arr[j]; j = j - 1; } arr[j + 1] = key; } } private static int[] indirectResult(final int[] refs, int[] values) { final int[] result = new int[1024]; Arrays.setAll(result, index -> values[refs[index]]); return result; } }

Resultado: En ambas pruebas, las versiones "primitiva" e "indirecta" son más rápidas que acceder a los objetos del montón. Es de esperar que unboxing no esté matando la velocidad, sino la latencia de la memoria a través de la búsqueda de punteros.

Vea también este video sobre el proyecto Valhalla: ("Los tipos de valor y la especialización genérica dentro de la JVM prometen brindarnos un mejor código JIT, la ubicación de los datos y eliminar la tiranía de la búsqueda de punteros".) https://vimeo.com/289667280


si ves colecciones.sort () doc oracle aquí se lee

Esta implementación vuelca la lista especificada en una matriz, ordena la matriz e itera sobre la lista restableciendo cada elemento desde la posición correspondiente en la matriz

lo que significa que está haciendo una clasificación de matrices e iteración adicional, esto implica que Collections.sort () es más lento que arrays.sort

  1. vuelca la lista especificada en una matriz
  2. ordena la matriz ~ arrays.sort
  3. itera sobre la lista restableciendo cada elemento desde la posición correspondiente en la matriz