spark retainedjobs maxretries jars deploy defaultcores hadoop apache-spark yarn spark-streaming

hadoop - retainedjobs - spark port maxretries



Spark Job ejecutándose en Yarn Cluster java.io.FileNotFoundException: El archivo no sale, aunque el archivo sale en el nodo maestro (3)

Soy bastante nuevo para Spark. Intenté buscar pero no pude encontrar una solución adecuada. He instalado hadoop 2.7.2 en dos cuadros (un nodo maestro y el otro nodo trabajador) Configuré el clúster siguiendo el siguiente enlace http://javadev.org/docs/hadoop/centos/6/installation/multi- node-installation-on-centos-6-non-sucure-mode / I ejecutaba la aplicación hadoop y spark como usuario raíz para probar el clúster.

He instalado la chispa en el nodo maestro y la chispa está comenzando sin ningún error. Sin embargo, cuando presento el trabajo utilizando el envío de chispa, recibo la excepción Archivo no encontrado aunque el archivo esté presente en el nodo maestro en la misma ubicación del error. Estoy ejecutando el comando Spark Submit debajo y encuentro el registro de salida debajo del mando.

/bin/spark-submit --class com.test.Engine --master yarn --deploy-mode cluster /app/spark-test.jar

16/04/21 19:16:13 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable 16/04/21 19:16:13 INFO RMProxy: Connecting to ResourceManager at /0.0.0.0:8032 16/04/21 19:16:14 INFO Client: Requesting a new application from cluster with 1 NodeManagers 16/04/21 19:16:14 INFO Client: Verifying our application has not requested more than the maximum memory capability of the cluster (8192 MB per container) 16/04/21 19:16:14 INFO Client: Will allocate AM container, with 1408 MB memory including 384 MB overhead 16/04/21 19:16:14 INFO Client: Setting up container launch context for our AM 16/04/21 19:16:14 INFO Client: Setting up the launch environment for our AM container 16/04/21 19:16:14 INFO Client: Preparing resources for our AM container 16/04/21 19:16:14 INFO Client: Source and destination file systems are the same. Not copying file:/mi/spark/lib/spark-assembly-1.6.1-hadoop2.6.0.jar 16/04/21 19:16:14 INFO Client: Source and destination file systems are the same. Not copying file:/app/spark-test.jar 16/04/21 19:16:14 INFO Client: Source and destination file systems are the same. Not copying file:/tmp/spark-120aeddc-0f87-4411-9400-22ba01096249/__spark_conf__5619348744221830008.zip 16/04/21 19:16:14 INFO SecurityManager: Changing view acls to: root 16/04/21 19:16:14 INFO SecurityManager: Changing modify acls to: root 16/04/21 19:16:14 INFO SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(root); users with modify permissions: Set(root) 16/04/21 19:16:15 INFO Client: Submitting application 1 to ResourceManager 16/04/21 19:16:15 INFO YarnClientImpl: Submitted application application_1461246306015_0001 16/04/21 19:16:16 INFO Client: Application report for application_1461246306015_0001 (state: ACCEPTED) 16/04/21 19:16:16 INFO Client: client token: N/A diagnostics: N/A ApplicationMaster host: N/A ApplicationMaster RPC port: -1 queue: default start time: 1461246375622 final status: UNDEFINEDsparkcluster01.testing.com tracking URL: http://sparkcluster01.testing.com:8088/proxy/application_1461246306015_0001/ user: root 16/04/21 19:16:17 INFO Client: Application report for application_1461246306015_0001 (state: ACCEPTED) 16/04/21 19:16:18 INFO Client: Application report for application_1461246306015_0001 (state: ACCEPTED) 16/04/21 19:16:19 INFO Client: Application report for application_1461246306015_0001 (state: ACCEPTED) 16/04/21 19:16:20 INFO Client: Application report for application_1461246306015_0001 (state: ACCEPTED) 16/04/21 19:16:21 INFO Client: Application report for application_1461246306015_0001 (state: FAILED) 16/04/21 19:16:21 INFO Client: client token: N/A diagnostics: Application application_1461246306015_0001 failed 2 times due to AM Container for appattempt_1461246306015_0001_000002 exited with exitCode: -1000 For more detailed output, check application tracking page:http://sparkcluster01.testing.com:8088/cluster/app/application_1461246306015_0001Then, click on links to logs of each attempt. Diagnostics: java.io.FileNotFoundException: File file:/app/spark-test.jar does not exist Failing this attempt. Failing the application. ApplicationMaster host: N/A ApplicationMaster RPC port: -1 queue: default start time: 1461246375622 final status: FAILED tracking URL: http://sparkcluster01.testing.com:8088/cluster/app/application_1461246306015_0001 user: root Exception in thread "main" org.ap/app/spark-test.jarache.spark.SparkException: Application application_1461246306015_0001 finished with failed status at org.apache.spark.deploy.yarn.Client.run(Client.scala:1034) at org.apache.spark.deploy.yarn.Client$.main(Client.scala:1081) at org.apache.spark.deploy.yarn.Client.main(Client.scala) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:731) at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:181) at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:206) at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:121) at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)

Incluso intenté ejecutar la chispa en el sistema de archivos HDFS colocando mi aplicación en HDFS y dando la ruta HDFS en el comando Spark Submit. Incluso entonces está arrojando la excepción File Not Found en algún archivo Spark Conf. Estoy ejecutando el comando debajo de Spark Submit y por favor encuentre la salida de registros debajo del comando.

./bin/spark-submit --class com.test.Engine --master yarn --deploy-mode cluster hdfs://sparkcluster01.testing.com:9000/beacon/job/spark-test.jar

16/04/21 18:11:45 INFO RMProxy: Connecting to ResourceManager at /0.0.0.0:8032 16/04/21 18:11:46 INFO Client: Requesting a new application from cluster with 1 NodeManagers 16/04/21 18:11:46 INFO Client: Verifying our application has not requested more than the maximum memory capability of the cluster (8192 MB per container) 16/04/21 18:11:46 INFO Client: Will allocate AM container, with 1408 MB memory including 384 MB overhead 16/04/21 18:11:46 INFO Client: Setting up container launch context for our AM 16/04/21 18:11:46 INFO Client: Setting up the launch environment for our AM container 16/04/21 18:11:46 INFO Client: Preparing resources for our AM container 16/04/21 18:11:46 INFO Client: Source and destination file systems are the same. Not copying file:/mi/spark/lib/spark-assembly-1.6.1-hadoop2.6.0.jar 16/04/21 18:11:47 INFO Client: Uploading resource hdfs://sparkcluster01.testing.com:9000/beacon/job/spark-test.jar -> file:/root/.sparkStaging/application_1461234217994_0017/spark-test.jar 16/04/21 18:11:49 INFO Client: Source and destination file systems are the same. Not copying file:/tmp/spark-f4eef3ac-2add-42f8-a204-be7959c26f21/__spark_conf__6818051470272245610.zip 16/04/21 18:11:50 INFO SecurityManager: Changing view acls to: root 16/04/21 18:11:50 INFO SecurityManager: Changing modify acls to: root 16/04/21 18:11:50 INFO SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(root); users with modify permissions: Set(root) 16/04/21 18:11:50 INFO Client: Submitting application 17 to ResourceManager 16/04/21 18:11:50 INFO YarnClientImpl: Submitted application application_1461234217994_0017 16/04/21 18:11:51 INFO Client: Application report for application_1461234217994_0017 (state: ACCEPTED) 16/04/21 18:11:51 INFO Client: client token: N/A diagnostics: N/A ApplicationMaster host: N/A ApplicationMaster RPC port: -1 queue: default start time: 1461242510849 final status: UNDEFINED tracking URL: http://sparkcluster01.testing.com:8088/proxy/application_1461234217994_0017/ user: root 16/04/21 18:11:52 INFO Client: Application report for application_1461234217994_0017 (state: ACCEPTED) 16/04/21 18:11:53 INFO Client: Application report for application_1461234217994_0017 (state: ACCEPTED) 16/04/21 18:11:54 INFO Client: Application report for application_1461234217994_0017 (state: FAILED) 16/04/21 18:11:54 INFO Client: client token: N/A diagnostics: Application application_1461234217994_0017 failed 2 times due to AM Container for appattempt_1461234217994_0017_000002 exited with exitCode: -1000 For more detailed output, check application tracking page:http://sparkcluster01.testing.com:8088/cluster/app/application_1461234217994_0017Then, click on links to logs of each attempt. Diagnostics: File file:/tmp/spark-f4eef3ac-2add-42f8-a204-be7959c26f21/__spark_conf__6818051470272245610.zip does not exist java.io.FileNotFoundException: File file:/tmp/spark-f4eef3ac-2add-42f8-a204-be7959c26f21/__spark_conf__6818051470272245610.zip does not exist at org.apache.hadoop.fs.RawLocalFileSystem.deprecatedGetFileStatus(RawLocalFileSystem.java:609) at org.apache.hadoop.fs.RawLocalFileSystem.getFileLinkStatusInternal(RawLocalFileSystem.java:822) at org.apache.hadoop.fs.RawLocalFileSystem.getFileStatus(RawLocalFileSystem.java:599) at org.apache.hadoop.fs.FilterFileSystem.getFileStatus(FilterFileSystem.java:421) at org.apache.hadoop.yarn.util.FSDownload.copy(FSDownload.java:253) at org.apache.hadoop.yarn.util.FSDownload.access$000(FSDownload.java:63) at org.apache.hadoop.yarn.util.FSDownload$2.run(FSDownload.java:361) at org.apache.hadoop.yarn.util.FSDownload$2.run(FSDownload.java:359) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:422) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657) at org.apache.hadoop.yarn.util.FSDownload.call(FSDownload.java:358) at org.apache.hadoop.yarn.util.FSDownload.call(FSDownload.java:62) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) at java.lang.Thread.run(Thread.java:745) Failing this attempt. Failing the application. ApplicationMaster host: N/A ApplicationMaster RPC port: -1 queue: default start time: 1461242510849 final status: FAILED tracking URL: http://sparkcluster01.testing.com:8088/cluster/app/application_1461234217994_0017 user: root Exception in thread "main" org.apache.spark.SparkException: Application application_1461234217994_0017 finished with failed status at org.apache.spark.deploy.yarn.Client.run(Client.scala:1034) at org.apache.spark.deploy.yarn.Client$.main(Client.scala:1081) at org.apache.spark.deploy.yarn.Client.main(Client.scala) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:731) at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:181) at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:206) at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:121) at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala) 16/04/21 18:11:55 INFO ShutdownHookManager: Shutdown hook called 16/04/21 18:11:55 INFO ShutdownHookManager: Deleting directory /tmp/spark-f4eef3ac-2add-42f8-a204-be7959c26f21


La configuración de chispa no apuntaba al directorio de configuración de hadoop derecho. La configuración de hadoop para 2.7.2 reside en la ruta del archivo hadoop 2.7.2./etc/hadoop/ en lugar de /root/hadoop2.7.2/conf. Cuando señalé HADOOP_CONF_DIR = / root / hadoop2.7.2 / etc / hadoop / bajo spark-env.sh, la chispa enviada comenzó a funcionar y la excepción de archivo no encontrado desapareció. Anteriormente apuntaba a /root/hadoop2.7.2/conf (que no sale). Si la chispa no apunta al directorio de configuración correcto de hadoop, podría generar un error similar. Creo que es probablemente un error de chispa, debería manejarlo con elegancia en lugar de arrojar mensajes de error ambiguos.


Tengo un error similar con Spark corriendo en EMR. Escribí mi código de chispa en Java 8, y en chispa de clúster EMR se ejecuta, por defecto, en Java 8. Luego tuve que volver a crear el clúster con JAVA_HOME apuntando a la versión de java 8. Ha resuelto mi problema. Por favor, consulte las líneas similares.


Tuve un problema similar, pero el problema estaba relacionado con tener dos core-site.xml uno en $ HADOOP_CONF_DIR y otro en $ SPARK_HOME / conf. El problema desapareció cuando eliminé el de $ SPARK_HOME / conf