Spark - (Take|TakeOrdered)

1 - About

The take action returns an array of the first n elements (not ordered) whereas takeOrdered returns an array with the first n elements after a sort

It's a Top N function

3 - Take

take(n)

Python:

rdd = sc.parallelize([1, 2, 3]) 
rdd.take(2)
Value: [1,2] # as list 

4 - Takeordered

takeOrdered(n,key=func)

Takeordered is an action that returns n elements ordered in ascending order as specified by the optional key function:

If key function returns a negative value (-1), the order is a descending order.

Python List:

rdd = sc.parallelize([5,3,1,2])
rdd.takeOrdered(3,lambda s: ‐1 * s)
Value: [5,3,2] # as list

Python Tuple:

topTenErrURLs = endpointSum.takeOrdered(10,lambda (x,y): -1*y)
db/spark/rdd/take.txt · Last modified: 2018/06/23 21:59 by 141.101.107.88