Spark - Collect

Spark Pipeline

About

The collect action returns the elements of a map.

All data must fit in the driver program.

Function

collect()

The collect() action returns all of the elements of the RDD as an array (collection ?).

rdd = sc.parallelize([1, 2, 3]) 
rdd.collect() 
Value: [1,2,3] # as list

collectAsMap()

collectAsMap() return the key-value pairs in this RDD to the master as a dictionary.

>>> m = sc.parallelize([(1, 2), (3, 4)]).collectAsMap()
>>> m[1]
2
>>> m[3]
4





Discover More
Spark Pipeline
Spark - Action

in RDD. Reduce aggregates a data set element using a function. Takeordered and take returns n elements ordered or not Collect returns all of the elements of the RDD as an array
Spark Pipeline
Spark - Key-Value RDD

Spark supports Key-Value pairs RDD in Python trough a list of tuple. A count of an RDD with tuple will return the number of tuples. A tuple can be seen as a row. Some Key-Value Transformations...



Share this page:
Follow us:
Task Runner