performance - ¿Por qué el paso final de reducción es extremadamente lento en este MapReduce?(HiveQL, HDFS MapReduce)
(1)
Alguna información de fondo:
Estoy trabajando con Dataiku DSS, HDFS y conjuntos de datos particionados. Tengo un trabajo particular en ejecución (consulta de Hive) que tiene dos conjuntos de datos de entrada: uno es un conjunto de datos muy grande y particionado, el otro es un conjunto de datos pequeño (~ 250 filas, 2 columnas), no particionado. Llamemos a la tabla particionada A y a la tabla no particionada B.
Pregunta:
La consulta es del siguiente formato,
SELECT a.f1, f2, ..., fn
FROM A as a
LEFT JOIN B as b
ON a.f1 = b.f1
WHERE {PARTITION_FILTER}
Aquí está la salida actual del trabajo MapReduce (tenga en cuenta que este trabajo todavía se está ejecutando):
[09:05:53] [INFO] [dku.utils] - INFO : Total jobs = 4
[09:05:53] [INFO] [dku.utils] - INFO : Starting task [Stage-10:CONDITIONAL] in serial mode
[09:05:53] [INFO] [dku.utils] - INFO : Stage-11 is filtered out by condition resolver.
[09:05:53] [INFO] [dku.utils] - INFO : Stage-1 is selected by condition resolver.
[09:05:53] [INFO] [dku.utils] - INFO : Launching Job 1 out of 4
[09:05:53] [INFO] [dku.utils] - INFO : Starting task [Stage-1:MAPRED] in serial mode
[09:05:53] [INFO] [dku.utils] - INFO : Number of reduce tasks not specified. Estimated from input data size: 307
[09:05:53] [INFO] [dku.utils] - INFO : In order to change the average load for a reducer (in bytes):
[09:05:53] [INFO] [dku.utils] - INFO : set hive.exec.reducers.bytes.per.reducer=<number>
[09:05:53] [INFO] [dku.utils] - INFO : In order to limit the maximum number of reducers:
[09:05:53] [INFO] [dku.utils] - INFO : set hive.exec.reducers.max=<number>
[09:05:53] [INFO] [dku.utils] - INFO : In order to set a constant number of reducers:
[09:05:53] [INFO] [dku.utils] - INFO : set mapreduce.job.reduces=<number>
[09:05:53] [INFO] [dku.utils] - INFO : number of splits:75
[09:05:53] [INFO] [dku.utils] - INFO : Submitting tokens for job: job_1529508387832_10211
[09:05:53] [INFO] [dip.hiveserver2.log.sniffer] - sniffed applicationId => application_1529508387832_10211/
[09:05:53] [INFO] [dku.utils] - INFO : Kill Command = /opt/cloudera/parcels/CDH-5.12.1-1.cdh5.12.1.p0.3/lib/hadoop/bin/hadoop job -kill job_1529508387832_10211
[09:05:53] [INFO] [dku.utils] - INFO : Hadoop job information for Stage-1: number of mappers: 75; number of reducers: 307
[09:05:53] [INFO] [dku.utils] - INFO : 2018-06-27 09:05:47,749 Stage-1 map = 0%, reduce = 0%
[09:06:48] [INFO] [dku.utils] - INFO : 2018-06-27 09:06:48,444 Stage-1 map = 0%, reduce = 0%, Cumulative CPU 278.18 sec
[09:06:53] [INFO] [dku.utils] - INFO : 2018-06-27 09:06:51,682 Stage-1 map = 1%, reduce = 0%, Cumulative CPU 373.35 sec
[09:07:03] [INFO] [dku.utils] - INFO : 2018-06-27 09:07:00,159 Stage-1 map = 2%, reduce = 0%, Cumulative CPU 501.46 sec
[09:07:03] [INFO] [dku.utils] - INFO : 2018-06-27 09:07:02,235 Stage-1 map = 3%, reduce = 0%, Cumulative CPU 539.6 sec
[09:07:08] [INFO] [dku.utils] - INFO : 2018-06-27 09:07:07,472 Stage-1 map = 4%, reduce = 0%, Cumulative CPU 1389.81 sec
[09:07:13] [INFO] [dku.utils] - INFO : 2018-06-27 09:07:10,605 Stage-1 map = 5%, reduce = 0%, Cumulative CPU 2172.52 sec
[09:07:13] [INFO] [dku.utils] - INFO : 2018-06-27 09:07:12,574 Stage-1 map = 6%, reduce = 0%, Cumulative CPU 2577.81 sec
[09:07:13] [INFO] [dku.utils] - INFO : 2018-06-27 09:07:13,604 Stage-1 map = 8%, reduce = 0%, Cumulative CPU 2865.44 sec
[09:07:18] [INFO] [dku.utils] - INFO : 2018-06-27 09:07:15,747 Stage-1 map = 9%, reduce = 0%, Cumulative CPU 3110.21 sec
[09:07:23] [INFO] [dku.utils] - INFO : 2018-06-27 09:07:19,898 Stage-1 map = 10%, reduce = 0%, Cumulative CPU 4080.2 sec
[09:07:23] [INFO] [dku.utils] - INFO : 2018-06-27 09:07:21,988 Stage-1 map = 11%, reduce = 0%, Cumulative CPU 4522.48 sec
[09:07:23] [INFO] [dku.utils] - INFO : 2018-06-27 09:07:23,015 Stage-1 map = 12%, reduce = 0%, Cumulative CPU 4755.96 sec
[09:07:33] [INFO] [dku.utils] - INFO : 2018-06-27 09:07:29,335 Stage-1 map = 13%, reduce = 0%, Cumulative CPU 5710.85 sec
[09:07:33] [INFO] [dku.utils] - INFO : 2018-06-27 09:07:31,407 Stage-1 map = 14%, reduce = 0%, Cumulative CPU 5948.34 sec
[09:07:38] [INFO] [dku.utils] - INFO : 2018-06-27 09:07:34,555 Stage-1 map = 15%, reduce = 0%, Cumulative CPU 6399.6 sec
[09:07:38] [INFO] [dku.utils] - INFO : 2018-06-27 09:07:37,663 Stage-1 map = 16%, reduce = 0%, Cumulative CPU 6811.22 sec
[09:07:43] [INFO] [dku.utils] - INFO : 2018-06-27 09:07:38,695 Stage-1 map = 19%, reduce = 0%, Cumulative CPU 7087.68 sec
[09:07:43] [INFO] [dku.utils] - INFO : 2018-06-27 09:07:39,729 Stage-1 map = 20%, reduce = 0%, Cumulative CPU 7288.22 sec
[09:07:43] [INFO] [dku.utils] - INFO : 2018-06-27 09:07:40,769 Stage-1 map = 22%, reduce = 0%, Cumulative CPU 7520.54 sec
[09:07:43] [INFO] [dku.utils] - INFO : 2018-06-27 09:07:41,903 Stage-1 map = 24%, reduce = 0%, Cumulative CPU 7771.37 sec
[09:07:43] [INFO] [dku.utils] - INFO : 2018-06-27 09:07:42,930 Stage-1 map = 25%, reduce = 0%, Cumulative CPU 7936.9 sec
[09:07:48] [INFO] [dku.utils] - INFO : 2018-06-27 09:07:45,035 Stage-1 map = 27%, reduce = 0%, Cumulative CPU 8254.78 sec
[09:07:48] [INFO] [dku.utils] - INFO : 2018-06-27 09:07:46,075 Stage-1 map = 29%, reduce = 0%, Cumulative CPU 8428.35 sec
[09:07:48] [INFO] [dku.utils] - INFO : 2018-06-27 09:07:47,111 Stage-1 map = 30%, reduce = 0%, Cumulative CPU 8661.23 sec
[09:07:48] [INFO] [dku.utils] - INFO : 2018-06-27 09:07:48,153 Stage-1 map = 31%, reduce = 0%, Cumulative CPU 8834.37 sec
[09:07:53] [INFO] [dku.utils] - INFO : 2018-06-27 09:07:49,193 Stage-1 map = 33%, reduce = 0%, Cumulative CPU 8983.68 sec
[09:07:53] [INFO] [dku.utils] - INFO : 2018-06-27 09:07:50,227 Stage-1 map = 35%, reduce = 0%, Cumulative CPU 9149.94 sec
[09:07:53] [INFO] [dku.utils] - INFO : 2018-06-27 09:07:51,263 Stage-1 map = 37%, reduce = 0%, Cumulative CPU 9268.9 sec
[09:07:53] [INFO] [dku.utils] - INFO : 2018-06-27 09:07:52,301 Stage-1 map = 38%, reduce = 0%, Cumulative CPU 9415.86 sec
[09:07:53] [INFO] [dku.utils] - INFO : 2018-06-27 09:07:53,352 Stage-1 map = 39%, reduce = 0%, Cumulative CPU 9540.63 sec
[09:07:58] [INFO] [dku.utils] - INFO : 2018-06-27 09:07:54,381 Stage-1 map = 41%, reduce = 0%, Cumulative CPU 9711.54 sec
[09:07:58] [INFO] [dku.utils] - INFO : 2018-06-27 09:07:55,421 Stage-1 map = 43%, reduce = 0%, Cumulative CPU 9823.52 sec
[09:07:58] [INFO] [dku.utils] - INFO : 2018-06-27 09:07:56,453 Stage-1 map = 46%, reduce = 0%, Cumulative CPU 10010.83 sec
[09:07:58] [INFO] [dku.utils] - INFO : 2018-06-27 09:07:57,492 Stage-1 map = 49%, reduce = 0%, Cumulative CPU 10081.9 sec
[09:07:58] [INFO] [dku.utils] - INFO : 2018-06-27 09:07:58,532 Stage-1 map = 53%, reduce = 0%, Cumulative CPU 10230.13 sec
[09:08:03] [INFO] [dku.utils] - INFO : 2018-06-27 09:07:59,576 Stage-1 map = 56%, reduce = 0%, Cumulative CPU 10392.61 sec
[09:08:03] [INFO] [dku.utils] - INFO : 2018-06-27 09:08:00,607 Stage-1 map = 58%, reduce = 0%, Cumulative CPU 10483.38 sec
[09:08:03] [INFO] [dku.utils] - INFO : 2018-06-27 09:08:01,649 Stage-1 map = 63%, reduce = 0%, Cumulative CPU 10618.16 sec
[09:08:03] [INFO] [dku.utils] - INFO : 2018-06-27 09:08:02,672 Stage-1 map = 66%, reduce = 0%, Cumulative CPU 10684.82 sec
[09:08:08] [INFO] [dku.utils] - INFO : 2018-06-27 09:08:03,695 Stage-1 map = 67%, reduce = 0%, Cumulative CPU 10701.95 sec
[09:08:08] [INFO] [dku.utils] - INFO : 2018-06-27 09:08:04,720 Stage-1 map = 70%, reduce = 0%, Cumulative CPU 10767.21 sec
[09:08:08] [INFO] [dku.utils] - INFO : 2018-06-27 09:08:05,750 Stage-1 map = 71%, reduce = 0%, Cumulative CPU 10849.92 sec
[09:08:08] [INFO] [dku.utils] - INFO : 2018-06-27 09:08:06,780 Stage-1 map = 73%, reduce = 0%, Cumulative CPU 10924.45 sec
[09:08:08] [INFO] [dku.utils] - INFO : 2018-06-27 09:08:07,902 Stage-1 map = 76%, reduce = 0%, Cumulative CPU 11000.21 sec
[09:08:13] [INFO] [dku.utils] - INFO : 2018-06-27 09:08:09,965 Stage-1 map = 77%, reduce = 0%, Cumulative CPU 11013.58 sec
[09:08:13] [INFO] [dku.utils] - INFO : 2018-06-27 09:08:10,991 Stage-1 map = 78%, reduce = 0%, Cumulative CPU 11057.76 sec
[09:08:18] [INFO] [dku.utils] - INFO : 2018-06-27 09:08:14,216 Stage-1 map = 80%, reduce = 0%, Cumulative CPU 11157.51 sec
[09:08:18] [INFO] [dku.utils] - INFO : 2018-06-27 09:08:17,362 Stage-1 map = 82%, reduce = 0%, Cumulative CPU 11392.85 sec
[09:08:23] [INFO] [dku.utils] - INFO : 2018-06-27 09:08:20,460 Stage-1 map = 83%, reduce = 0%, Cumulative CPU 11610.7 sec
[09:08:28] [INFO] [dku.utils] - INFO : 2018-06-27 09:08:26,106 Stage-1 map = 84%, reduce = 0%, Cumulative CPU 11781.65 sec
[09:08:28] [INFO] [dku.utils] - INFO : 2018-06-27 09:08:28,163 Stage-1 map = 85%, reduce = 0%, Cumulative CPU 11788.58 sec
[09:08:38] [INFO] [dku.utils] - INFO : 2018-06-27 09:08:35,410 Stage-1 map = 86%, reduce = 0%, Cumulative CPU 12167.24 sec
[09:08:38] [INFO] [dku.utils] - INFO : 2018-06-27 09:08:38,540 Stage-1 map = 86%, reduce = 1%, Cumulative CPU 12317.09 sec
[09:08:43] [INFO] [dku.utils] - INFO : 2018-06-27 09:08:39,583 Stage-1 map = 86%, reduce = 2%, Cumulative CPU 12329.8 sec
[09:08:43] [INFO] [dku.utils] - INFO : 2018-06-27 09:08:40,631 Stage-1 map = 86%, reduce = 3%, Cumulative CPU 12333.61 sec
[09:08:48] [INFO] [dku.utils] - INFO : 2018-06-27 09:08:47,927 Stage-1 map = 87%, reduce = 3%, Cumulative CPU 12651.77 sec
[09:08:53] [INFO] [dku.utils] - INFO : 2018-06-27 09:08:48,970 Stage-1 map = 88%, reduce = 4%, Cumulative CPU 12826.37 sec
[09:08:53] [INFO] [dku.utils] - INFO : 2018-06-27 09:08:51,092 Stage-1 map = 88%, reduce = 5%, Cumulative CPU 12857.19 sec
[09:08:58] [INFO] [dku.utils] - INFO : 2018-06-27 09:08:54,214 Stage-1 map = 90%, reduce = 5%, Cumulative CPU 13037.63 sec
[09:09:03] [INFO] [dku.utils] - INFO : 2018-06-27 09:08:59,396 Stage-1 map = 91%, reduce = 5%, Cumulative CPU 13117.71 sec
[09:09:03] [INFO] [dku.utils] - INFO : 2018-06-27 09:09:00,440 Stage-1 map = 92%, reduce = 5%, Cumulative CPU 13238.06 sec
[09:09:03] [INFO] [dku.utils] - INFO : 2018-06-27 09:09:01,485 Stage-1 map = 93%, reduce = 5%, Cumulative CPU 13249.8 sec
[09:09:08] [INFO] [dku.utils] - INFO : 2018-06-27 09:09:05,660 Stage-1 map = 94%, reduce = 5%, Cumulative CPU 13306.0 sec
[09:09:08] [INFO] [dku.utils] - INFO : 2018-06-27 09:09:06,706 Stage-1 map = 97%, reduce = 5%, Cumulative CPU 13393.5 sec
[09:09:08] [INFO] [dku.utils] - INFO : 2018-06-27 09:09:07,751 Stage-1 map = 97%, reduce = 6%, Cumulative CPU 13409.12 sec
[09:09:08] [INFO] [dku.utils] - INFO : 2018-06-27 09:09:08,795 Stage-1 map = 98%, reduce = 6%, Cumulative CPU 13433.07 sec
[09:09:13] [INFO] [dku.utils] - INFO : 2018-06-27 09:09:09,835 Stage-1 map = 98%, reduce = 8%, Cumulative CPU 13474.03 sec
[09:09:13] [INFO] [dku.utils] - INFO : 2018-06-27 09:09:10,874 Stage-1 map = 98%, reduce = 9%, Cumulative CPU 13484.64 sec
[09:09:18] [INFO] [dku.utils] - INFO : 2018-06-27 09:09:14,004 Stage-1 map = 100%, reduce = 11%, Cumulative CPU 13580.71 sec
[09:09:18] [INFO] [dku.utils] - INFO : 2018-06-27 09:09:15,118 Stage-1 map = 100%, reduce = 13%, Cumulative CPU 13619.15 sec
[09:09:18] [INFO] [dku.utils] - INFO : 2018-06-27 09:09:16,160 Stage-1 map = 100%, reduce = 16%, Cumulative CPU 13707.2 sec
[09:09:18] [INFO] [dku.utils] - INFO : 2018-06-27 09:09:17,210 Stage-1 map = 100%, reduce = 33%, Cumulative CPU 14456.75 sec
[09:09:18] [INFO] [dku.utils] - INFO : 2018-06-27 09:09:18,258 Stage-1 map = 100%, reduce = 39%, Cumulative CPU 14708.07 sec
[09:09:23] [INFO] [dku.utils] - INFO : 2018-06-27 09:09:19,291 Stage-1 map = 100%, reduce = 40%, Cumulative CPU 14768.29 sec
[09:09:23] [INFO] [dku.utils] - INFO : 2018-06-27 09:09:20,329 Stage-1 map = 100%, reduce = 41%, Cumulative CPU 14834.88 sec
[09:09:23] [INFO] [dku.utils] - INFO : 2018-06-27 09:09:21,360 Stage-1 map = 100%, reduce = 42%, Cumulative CPU 14902.4 sec
[09:09:23] [INFO] [dku.utils] - INFO : 2018-06-27 09:09:22,399 Stage-1 map = 100%, reduce = 45%, Cumulative CPU 15040.16 sec
[09:09:23] [INFO] [dku.utils] - INFO : 2018-06-27 09:09:23,433 Stage-1 map = 100%, reduce = 47%, Cumulative CPU 15165.58 sec
[09:09:28] [INFO] [dku.utils] - INFO : 2018-06-27 09:09:28,627 Stage-1 map = 100%, reduce = 63%, Cumulative CPU 15792.29 sec
[09:09:33] [INFO] [dku.utils] - INFO : 2018-06-27 09:09:30,711 Stage-1 map = 100%, reduce = 64%, Cumulative CPU 15889.21 sec
[09:09:33] [INFO] [dku.utils] - INFO : 2018-06-27 09:09:31,753 Stage-1 map = 100%, reduce = 65%, Cumulative CPU 15898.95 sec
[09:09:33] [INFO] [dku.utils] - INFO : 2018-06-27 09:09:32,789 Stage-1 map = 100%, reduce = 66%, Cumulative CPU 15927.26 sec
[09:09:33] [INFO] [dku.utils] - INFO : 2018-06-27 09:09:33,822 Stage-1 map = 100%, reduce = 71%, Cumulative CPU 16086.93 sec
[09:09:38] [INFO] [dku.utils] - INFO : 2018-06-27 09:09:34,866 Stage-1 map = 100%, reduce = 90%, Cumulative CPU 16711.13 sec
[09:09:38] [INFO] [dku.utils] - INFO : 2018-06-27 09:09:35,907 Stage-1 map = 100%, reduce = 93%, Cumulative CPU 16795.34 sec
[09:09:38] [INFO] [dku.utils] - INFO : 2018-06-27 09:09:36,952 Stage-1 map = 100%, reduce = 95%, Cumulative CPU 16881.47 sec
[09:09:38] [INFO] [dku.utils] - INFO : 2018-06-27 09:09:37,995 Stage-1 map = 100%, reduce = 96%, Cumulative CPU 16891.18 sec
[09:09:48] [INFO] [dku.utils] - INFO : 2018-06-27 09:09:44,249 Stage-1 map = 100%, reduce = 97%, Cumulative CPU 16958.21 sec
[09:09:48] [INFO] [dku.utils] - INFO : 2018-06-27 09:09:45,292 Stage-1 map = 100%, reduce = 98%, Cumulative CPU 17011.88 sec
[09:09:48] [INFO] [dku.utils] - INFO : 2018-06-27 09:09:47,378 Stage-1 map = 100%, reduce = 99%, Cumulative CPU 17055.07 sec
[09:10:49] [INFO] [dku.utils] - INFO : 2018-06-27 09:10:47,421 Stage-1 map = 100%, reduce = 99%, Cumulative CPU 17545.89 sec
[09:11:49] [INFO] [dku.utils] - INFO : 2018-06-27 09:11:47,872 Stage-1 map = 100%, reduce = 99%, Cumulative CPU 17764.45 sec
[09:12:49] [INFO] [dku.utils] - INFO : 2018-06-27 09:12:48,287 Stage-1 map = 100%, reduce = 99%, Cumulative CPU 18330.86 sec
[09:13:49] [INFO] [dku.utils] - INFO : 2018-06-27 09:13:48,855 Stage-1 map = 100%, reduce = 99%, Cumulative CPU 19232.58 sec
...
...
...
...
...
[10:22:20] [INFO] [dku.utils] - INFO : 2018-06-27 10:22:17,545 Stage-1 map = 100%, reduce = 99%, Cumulative CPU 26857.28 sec
[10:23:23] [INFO] [dku.utils] - INFO : 2018-06-27 10:23:17,654 Stage-1 map = 100%, reduce = 99%, Cumulative CPU 26974.96 sec
[10:24:18] [INFO] [dku.utils] - INFO : 2018-06-27 10:24:18,112 Stage-1 map = 100%, reduce = 99%, Cumulative CPU 27081.0 sec
[10:25:23] [INFO] [dku.utils] - INFO : 2018-06-27 10:25:18,964 Stage-1 map = 100%, reduce = 99%, Cumulative CPU 27187.52 sec
[10:26:23] [INFO] [dku.utils] - INFO : 2018-06-27 10:26:19,404 Stage-1 map = 100%, reduce = 99%, Cumulative CPU 27287.58 sec
Como puede ver, el 1% final de la fase de reducción está tomando bastante tiempo (hasta el momento ha transcurrido 1h27min del total de 1h32min).
Estoy teniendo problemas para encontrar recursos que den una explicación clara y concisa de por qué sucede esto.
También vale la pena mencionar que tengo una comprensión muy básica de MapReduce, pero actualmente estoy en el proceso de estudiar MapReduce en un contexto HDFS.
Por lo que he leído, es posible que la mayor parte de la tarea de reducción se fijara en un reductor debido a la naturaleza de la
JOIN
realizada.
Si alguien puede dar una explicación de alto nivel de lo que está sucediendo, apúnteme en la dirección de algunos buenos recursos y dé consejos sobre cómo puedo evitar esto en futuros trabajos al alterar mis consultas de Hive, eso sería muy apreciado.
Si el reductor final es una unión, entonces parece que la curva de unión está sesgada. En primer lugar, compruebe dos cosas:
compruebe que la clave de unión b.f1 no tiene duplicados:
select b.f1, count(*) cnt from B b
group by b.f1
having count(*)>1 order by cnt desc;
Compruebe la distribución de a.f1:
select a.f1, count(*) cnt from A a
group by a.f1
order by cnt desc
limit 10;
Esta consulta mostrará claves sesgadas.
Si hay un sesgo (demasiadas filas con el mismo valor), entonces una las teclas sesgadas por separado, use unir todas:
SELECT a.f1, f2, ..., fn
FROM ( select * from A where f1 = skewed_value) as a --skewed
LEFT JOIN B as b
ON a.f1 = b.f1
WHERE {PARTITION_FILTER}
UNION ALL
SELECT a.f1, f2, ..., fn
FROM ( select * from A where f1 != skewed_value) as a --all other
LEFT JOIN B as b
ON a.f1 = b.f1
WHERE {PARTITION_FILTER}
Y finalmente, si no hay problemas con el sesgo y la duplicación, intente aumentar el paralelismo de los reductores: Obtenga bytes actuales por configuración de reductor
establece hive.exec.reducers.bytes.per.reducer;
Por lo general, esto devolverá algún valor sobre 1G. Intente dividir por dos, establezca un nuevo valor antes de su consulta y verifique cuántos reductores arrancará y el rendimiento. Los criterios de éxito son más reductores, y el rendimiento ha mejorado.
set hive.exec.reducers.bytes.per.reducer=67108864;
Cuanto menos sean los bytes por reductor, más reductores se iniciarán, aumentando el paralelismo;
ACTUALIZACIÓN: intente habilitar la combinación de mapas , su segunda tabla es lo suficientemente pequeña como para caber en la memoria, mapjoin funcionará sin reductores en absoluto y no será un problema el sesgo de los reductores.
Cómo habilitar mapjoin: https://.com/a/49154414/2700344