Spark - Jar

> Database > Spark

1 - Launch external Jar

Jar can be defined in a spark-submit command via

  • Jar file with the:
    • --jars option. It define the path to jars file that will be automatically transferred to the cluster.
  • Maven coordinates:
    • --package option - a comma-delimited list of Maven coordinates
    • --repositories options - to define the maven repo
spark-submit --jars additional1.jar,additional2.jar \
  --driver-class-path additional1.jar:additional2.jar \
  --conf spark.executor.extraClassPath=additional1.jar:additional2.jar \
  --packages mypackage
  --class MyClass main-application.jar

More, see advanced-dependency-management

2 - Conf

2.1 - jars

Search jars in Spark config

  • spark.jars is the comma-separated list of jars to include on the driver and executor classpaths. Globs are allowed.
Advertising

2.2 - Library Path

  • spark.driver.extraLibraryPath
  • spark.executor.extraLibraryPath

Value example: /usr/hdp/current/hadoop-client/lib/native:/usr/hdp/current/hadoop-client/lib/native/Linux-amd64-64

3 - Location

  • Local: SPARK_HOME\jars
  • Local: PYSPARK_HOME\jars
  • Azure:
Advertising