# (Relation|Table) - Tabular data

Tables (also known as relation) model either:

but not both.

Slightly outside of the tables in the data model, we have other kinds of tables:

• Staging tables bring in “dirty data” so we can scrub it and then insert it into base tables.
• Auxiliary tables hold static data for use in the system, acting as the relational replacement for computations.

In the SQL Iso, a table is a collection of zero or more rows where each row is a sequence of one or more column values.

where:

A table is a set of data with possible different data type by column.

Tables are view of the commit log showing the latest state.

## 3 - Splits

Even if it's possible to get all data in one table that has the same grain (unique key), you may made two tables to split the columns that are not functional related.

Why ?

• because you get minder dependency. You can changes your table and your process independently. For instance, the test data set don't need to be updated to reflect the changes of a second functional process.
• You can then set columns NULLABLE. If a process updates columns after an other one (such as total calculation), the value of this column are unknown for the first process and the columns must be set not nullable.
• You can add audit columns by process (Update and create (date|user))

Why not ?

• You may got consistency problems. There may be not a one-to-one relationship. But as the two functional processes are independent, it's may be not always needed.
• Performance: You add a join cost

## 4 - Implementation

### 4.4 - Spark Data Frame

Spark DataFrame is a distributed collection of data organized into named columns

Example:

people.col("age").plus(10);  // in Java

### 4.5 - Data Frame Panda

Data Frame Panda (API) is a 2-dimensional labeled data structure with columns of potentially different types.

You can think of it like a spreadsheet or SQL table, or a dict of Series objects.

DataFrame accepts many different kinds of input:

• Dict of 1D ndarrays, lists, dicts, or Series
• 2-D numpy.ndarray
• Structured or record ndarray
• A Series
• Another DataFrame
• sequence of (key, value) pairs

### 4.6 - R DataFrame

A data frame (doc), a matrix-like structure whose columns may be of differing types (numeric, logical, factor and character and so on).

A data frame is a collection of data organized into named columns from differents data type.

### 4.7 - Derby

In java\client\org\apache\derby\client\am\Cursor.java, they hold the data in byte array.

//-------------Structures for holding and scrolling the data -----------------
public byte[] dataBuffer_;
public ByteArrayOutputStream dataBufferStream_;
public int lastValidBytePosition_;
public boolean hasLobs_; // is there at least one LOB column?

// Current row positioning
protected int currentRowPosition_;
private int nextRowPosition_;
// Let's new up a 2-dimensional array based on fetch-size and reuse so that
protected int[] columnDataPosition_;

// This is the actual, computed lengths of varchar fields, not the max length from query descriptor or DA
protected int[] columnDataComputedLength_;
// populate this for

Engine:

• Types: org.apache.derby.iapi.types.DataType Interface. And see all SQL type implementation (SQLBinary, SQLBit, SQLBlob, … )
• ResultSet Interface