Relation - Data Operator (Relational Algebra)
Table of Contents
1 - About
Relational algebra defines the relational database through a set of data operators (select, filter, join, sort, union, etc.) that defines an intermediate format for query planning/optimization.
Relational algebra is an extension to mathematical set theory, it was devised by E.F. Code (IBM) in 1970.
Programs that manipulate tabular data exhibit an algebraic structure allowing reasoning and manipulation independently of physical data representation.
All operations on a table return a table. The operation can then be chained together.
Relational algebra expressions dictate how to achieve an answer by giving what operations to do and in what order to do them. A declarative language only expresses conditions that must be met in order for a result to be an answer, not how to get that answer.
For instance, SQL is the what not the how. It's clear what we want unclear how to get it
Every query returns a table. The language is “algebraically closed” A view is stored as an algebraic closure so it can be optimized when used with queries.
2 - Articles Related
3 - Order
By design, query results in relational algebra are unordered. Repeating the same query multiple times is not guaranteed to return results in the same order. Similarly, database-backed relational data also do not guarantee row order.
4 - Relation Operator Algebra
|Normal||select, Selection||<math>\sigma</math> of s|
|Normal||union||<math>\cup</math>|| union (without duplicates)
of union all (with duplicates)
|Extended||Grouping and aggregating||g|
4.1 - Intersection
- Derived operator using minus
<math> R1 \cap R2 = R1 - ( R1 - R2 ) </math>
This expression <math>( R1 - R2 )</math> returns anything that is only in R1
- Derived using join
<math> R1 \cap R2 = R1 x R2 </math>
4.2 - Selection
4.3 - Project
A project operator keeps only a set of columns (in other words eliminates the others).
4.4 - Cross Product
<math>R1 x R2</math>
The size of this cross-join is then the size of R1 multiplied by the size of R2.
Find all pairs of similar images/tweets/songs: Compute the cross product, then compute a similarity function f(x,,x2) for every possible pair
4.5 - Equi-join
Equi-join is a join where the condition is an equality.
SELECT * FROM R1 JOIN R2 ON R1.A = R2.A
Cross product notation:
SELECT * FROM R1, R2 WHERE R1.A = R2.A
4.6 - Theta-join
A theta-join is a difficult join where generally the condition is not a equality. Band join or range join.
It's not because you don't have any join key in the sql that you don't have a join physically.
- Example 1: Find all the hospitals within 5 miles of a school
SELECT DISTINCT h.name FROM hospitals h, schools s WHERE distance(h.location,s.location) < 5
- Example 2: Find all user clicks made within 5 seconds of page load
SELECT * FROM Clicks c, PageLoads p WHERE abs(c.click_time - p.load_time) < 5
5 - Bag vs Set
Relational Algebra has two semantics:
Rule of thumb:
- Every paper will assume set semantics
- Every implementation will assume bag semantics
6 - Algebraic Optimization
7 - Equivalent logical expressions with different costs
8 - Documentation / Reference
- Bill Howe, PhD (Director of Research, Scalable Data Analytics) University of Washington - eScience Institute