Spark - Variable

pySpark

pySpark provides shared variables in two different types.

Broadcast variables are an efficient way of sending data once that would otherwise be sent multiple times automatically in closures.

Accumulators can only be written by workers and read by the driver program.

They allow us to aggregate values from workers back to the driver.