¿Cómo cargar el archivo de propiedades java y usar en Spark?

apache-spark properties-file (2)

Aquí encontré una solución:

archivo de propiedades: (mypropsfile.conf) // nota: prefije su clave con "spark". de lo contrario los accesorios serán ignorados.

spark.myapp.input /input/path spark.myapp.output /output/path

lanzamiento

$SPARK_HOME/bin/spark-submit --properties-file mypropsfile.conf

cómo llamar en el código :( dentro del código)

sc.getConf.get("spark.driver.host") // localhost sc.getConf.get("spark.myapp.input") // /input/path sc.getConf.get("spark.myapp.output") // /output/path

Quiero almacenar los argumentos de Spark, como el archivo de entrada, el archivo de salida en un archivo de propiedades de Java y pasar ese archivo al controlador de Spark. Estoy usando spark-submit para enviar el trabajo pero no pude encontrar un parámetro para pasar el archivo de propiedades. ¿Tiene alguna sugestión?

El enfoque de la respuesta anterior tiene la restricción de que cada propiedad debe comenzar con spark en el archivo de propiedades.

p.ej

spark.myapp.input
spark.myapp.output

Si supones que tienes una propiedad que no comienza con spark :

job.property:

app.name = xyz

$SPARK_HOME/bin/spark-submit --properties-file job.property

Spark ignorará todas las propiedades que no tengan el prefijo spark. con mensaje:

Advertencia: ignorando la propiedad de configuración que no es de chispa: app.name = prueba

Cómo administro el archivo de propiedades en el controlador y el ejecutor de la aplicación:

${SPARK_HOME}/bin/spark-submit --files job.properties

Código Java para acceder al archivo de caché (job.properties) :

import java.util.Properties; import org.apache.hadoop.fs.FSDataInputStream; import org.apache.hadoop.fs.FileSystem; import org.apache.hadoop.fs.Path; import org.apache.spark.SparkFiles; import java.io.InputStream; import java.io.FileInputStream; //Load file to propert object using HDFS FileSystem String fileName = SparkFiles.get("job.properties") Configuration hdfsConf = new Configuration(); FileSystem fs = FileSystem.get(hdfsConf); //THe file name contains absolute path of file FSDataInputStream is = fs.open(new Path(fileName)); // Or use java IO InputStream is = new FileInputStream("/res/example.xls"); Properties prop = new Properties(); //load properties prop.load(is) //retrieve properties prop.getProperty("app.name");

Si tiene propiedades específicas del entorno (dev/test/prod) , suministre la variable de entorno Java personalizada APP_ENV en spark-submit :

${SPARK_HOME}/bin/spark-submit --conf / "spark.driver.extraJavaOptions=-DAPP_ENV=dev spark.executor.extraJavaOptions=-DAPP_ENV=dev" / --properties-file dev.property

Reemplace su controlador o código ejecutor:

//Load file to propert object using HDFS FileSystem String fileName = SparkFiles.get(System.getProperty("APP_ENV")+".properties")