Spark DataSet - Bucket

About

A partition may be divided in bucket.

Articles Related

Management

Write

Buckets the output by the given columns. If specified, the output is laid out on the file system similar to Hive's bucketing scheme.

This is applicable for all file-based data sources (e.g. Parquet, JSON) starting with Spark 2.1.0.

docs/latest/api/java/org/apache/spark/sql/DataFrameWriter.html