Spark - HDFS

About

HDFS in Spark.

Articles Related

Management

Configuration

If you plan to read and write from HDFS using Spark, there are two Hadoop configuration files that should be included on Spark’s classpath (???):

hdfs-site.xml, which provides default behaviors for the HDFS client.
core-site.xml, which sets the default filesystem name.

To make these files visible to Spark, set HADOOP_CONF_DIR in $SPARK_HOME/conf/spark-env.sh to a location containing the configuration files.

Documentation / Reference

https://spark.apache.org/docs/latest/configuration.html#inheriting-hadoop-cluster-configuration