Spark - Filter Transformation

Spark Pipeline

About

filter(func) returns a new data set (RDD) that's formed by selecting those elements of the source on which the function returns true.

Example

Modulo

rdd.filter(lambda x:x % 2 == 0)
[1,2,3,4] → [2,4] 

text

lines = sc.textFile("...",4)
comments = lines.filter(isComment)
# where isComment is a funcion that return a boolean





Discover More
Spark Pipeline
Spark - (RDD) Transformation

transformation function in RDD Transformations Description filter returns a new data set that's formed by selecting those elements of the source on which a function returns true. distinct([numTasks]))...



Share this page:
Follow us:
Task Runner