Spark DataSet - Object

Card Puncher Data Processing

About

A dataset is a data set of specific object.

Management

Definition

To define this domain Specific Object, an encoder is required.

  • Scala
val people = spark.read.parquet("...").as[Person]
Dataset<Person> people = spark.read().parquet("...").as(Encoders.bean(Person.class)); 

See

To understand the internal binary representation for data, use the schema function.

Documentation / Reference





Discover More
Card Puncher Data Processing
Spark - DataSet

Dataset is a interface to the Spark Engine added in Spark 1.6 that provides: provides the benefits of RDDs (strong typing, ability to use powerful lambda functions) with the benefits of Spark SQL’s...
Card Puncher Data Processing
Spark DataSet - (Object) Encoder

To define a dataset Object, an encoder is required. It is used to tell Spark to generate code at runtime to serialize the object into a binary structure. This binary structure often has much lower...



Share this page:
Follow us:
Task Runner