Spark - Collect

1 - About

The collect action returns the elements of a map.

All data must fit in the driver program.

3 - Function

3.1 - collect()

The collect() action returns all of the elements of the RDD as an array (collection ?).

rdd = sc.parallelize([1, 2, 3]) 
rdd.collect() 
Value: [1,2,3] # as list

3.2 - collectAsMap()

collectAsMap() return the key-value pairs in this RDD to the master as a dictionary.

>>> m = sc.parallelize([(1, 2), (3, 4)]).collectAsMap()
>>> m[1]
2
>>> m[3]
4
db/spark/collect.txt ยท Last modified: 2017/09/06 20:15 by gerardnico