About
Aggregate functions return a single value
- or selected
from values that are in a aggregation relationship (ie a set)
This values are also known as summary because they try to summarize / describe a set of data
List
Computed
You compute generally over additive numeric data grouped by class attribute
- AVG() - mean - Returns the average / the mean (sum/count)
- COUNT() - Returns the number of rows
- SUM() - Returns the sum
- mode - Returns the mode (most often)
- Cryptography - Hash - Returns a probabilistic unique string representation
Selection
- FIRST() - Returns the first value
- LAST() - Returns the last value
- MAX() - Returns the largest value
- MIN() - Returns the smallest value
- Quantile - (Median|Middle) - Returns the median
Implementation
Mutative operation
Mutative accumulation for a sum
int sum = 0;
for (int x : numbers) {
sum += x;
}
Reduction operation
They are implemented as reduction operation
Partition
Some operations like AVG are not partitionable. The computation can't therefore happens in parallel.
It is not valid to compute them on partitioned data because they are not commutative and associative.
You can still compute partial aggregates by transforming the non-commutative and associative function by commutative and associative function and, then roll them up.
Example: if:
- you want to compute AVG(x)
- you expand to SUM(x) / COUNT(x),
- You compute partition SUM(x) and COUNT(X) on each partition
- You sum them using SUM.