# Relation - Data Operator (Relational Algebra)

> (Data|State) Management and Processing > (Data Type|Data Structure) > (Relation|Table) - Tabular data > Relation - Data Operator (Relational Algebra)

### Table of Contents

## 1 - About

Relational algebra defines the relational database through a set of data operators (select, filter, join, sort, union, etc.) that defines an intermediate format for query planning/optimization.

Merge with Sql Engine - Relational Operator (Data operations|Execution Plan Steps) ?

Relational algebra is an extension to mathematical set theory, it was devised by E.F. Code (IBM) in 1970.

where:

Programs that manipulate tabular data exhibit an algebraic structure allowing reasoning and manipulation independently of physical data representation.

All operations on a table return a table. The operation can then be chained together.

Relational algebra expressions dictate how to achieve an answer by giving what operations to do and in what order to do them. A declarative language only expresses conditions that must be met in order for a result to be an answer, not how to get that answer.

For instance, SQL is the what not the how. It's clear what we want unclear how to get it

Every query returns a table. The language is “algebraically closed” A view is stored as an algebraic closure so it can be optimized when used with queries.

## 2 - Articles Related

## 3 - Order

By design, query results in relational algebra are unordered. Repeating the same query multiple times is not guaranteed to return results in the same order. Similarly, database-backed relational data also do not guarantee row order.

## 4 - Relation Operator Algebra

Type | Name | Symbol | Sql Operator |
---|---|---|---|

Normal | select, Selection | <math>\sigma</math>
of s | |

Normal | project | <math>\pi</math> | |

Normal | join | <math>\Join</math> | |

Normal | union | <math>\cup</math> | union (without duplicates) of union all (with duplicates) |

Normal | intersection | <math>\cap</math> | |

Normal | difference | - | except, minus |

Extended | Grouping and aggregating | g | |

Extended | Duplicate elimination | d | |

Extended | Sorting | t |

### 4.1 - Intersection

- Derived operator using minus

This expression

<math>( R1 - R2 )</math>returns anything that is only in R1

- Derived using join

### 4.2 - Selection

### 4.3 - Project

A project operator keeps only a set of columns (in other words eliminates the others).

### 4.4 - Cross Product

The size of this cross-join is then the size of R1 multiplied by the size of R2.

Find all pairs of similar images/tweets/songs: Compute the cross product, then compute a similarity function f(x,,x2) for every possible pair

### 4.5 - Equi-join

Equi-join is a join where the condition is an equality.

Equality Notation:

SELECT * FROM R1 JOIN R2 ON R1.A = R2.A

Cross product notation:

SELECT * FROM R1, R2 WHERE R1.A = R2.A

### 4.6 - Theta-join

A theta-join is a difficult join where generally the condition is not a equality. Band join or range join.

It's not because you don't have any join key in the sql that you don't have a join physically.

- Example 1: Find all the hospitals within 5 miles of a school

SELECT DISTINCT h.name FROM hospitals h, schools s WHERE distance(h.location,s.location) < 5

- Example 2: Find all user clicks made within 5 seconds of page load

SELECT * FROM Clicks c, PageLoads p WHERE abs(c.click_time - p.load_time) < 5

## 5 - Bag vs Set

Relational Algebra has two semantics:

Rule of thumb:

- Every paper will assume set semantics
- Every implementation will assume bag semantics

## 6 - Algebraic Optimization

## 7 - Equivalent logical expressions with different costs

## 8 - Documentation / Reference

- Bill Howe, PhD (Director of Research, Scalable Data Analytics) University of Washington - eScience Institute