Spark - Variable

Spark Pipeline

pySpark

pySpark provides shared variables in two different types.

Broadcast

Broadcast variables are an efficient way of sending data once that would otherwise be sent multiple times automatically in closures.

Accumulator

Accumulators can only be written by workers and read by the driver program.

They allow us to aggregate values from workers back to the driver.





Discover More
Card Puncher Data Processing
PySpark - Closure

Spark automatically creates closures: for functions that run on RDDs at workers, and for any global variables that are used by those workers. One closure is send per worker for every task. closures...
Rdd 5 Partition 3 Worker
Spark - Executor (formerly Worker)

When running on a cluster, each Spark application gets an independent set of executor JVMs that only run tasks and store data for that application. Worker or Executor are processes that run computations...



Share this page:
Follow us:
Task Runner