Spark - (Random) Split
1 - About
2 - Articles Related
3 - Function
3.1 - randomSplit
randomSplit randomly splits a RDD with the provided weights.
- weights – weights for splits, will be normalized if they don’t sum to 1
- seed – random seed
Example of percentage split
weights = [.8, .1, .1] seed = 42 # seed=0L # Use randomSplit with weights and seed rawTrainData, rawValidationData, rawTestData = rawData.randomSplit(weights, seed)
The exact number of entries in each dataset varies slightly due to the random nature of the randomSplit() transformation.