Calcite - Query Cost

Card Puncher Data Processing

About

SQL Engine - (Query Plan) Cost in calcite.

Calcite applies a Cost based optimizer by default that is called the Volcano planner.

The cost is provided by the relational expression (relNode). See calculation

  • Cost is represented by RelOptCost
  • Cost typically includes
    • rowcount
    • IO
    • CPU cost
  • Cost estimates are relative
  • Statistics are used to improve accuracy of cost estimations

Management

Print

The cost can be seen when you print the relational expression (ie explain) with all attributes.

Example from the getting started guide.

RelNode relNode = relRoot.project();
final RelWriter relWriter = new RelWriterImpl(new PrintWriter(System.out), SqlExplainLevel.ALL_ATTRIBUTES, false);
relNode.explain(relWriter);
LogicalSort(sort0=[$0], dir0=[ASC]): rowcount = 150.0, cumulative cost = {3668.75 rows, 11014.762352915506 cpu, 0.0 io}, id = 18
  LogicalProject(NAME=[$1], EXPR$1=[$2]): rowcount = 150.0, cumulative cost = {3518.75 rows, 5002.0 cpu, 0.0 io}, id = 17
    LogicalAggregate(group=[{0, 1}], EXPR$1=[COUNT()]): rowcount = 150.0, cumulative cost = {3368.75 rows, 4702.0 cpu, 0.0 io}, id = 15
      LogicalProject(deptno0=[$5], NAME=[$6], empid=[$0]): rowcount = 1500.0, cumulative cost = {3200.0 rows, 4702.0 cpu, 0.0 io}, id = 14
        LogicalJoin(condition=[=($1, $5)], joinType=[inner]): rowcount = 1500.0, cumulative cost = {1700.0 rows, 202.0 cpu, 0.0 io}, id = 12
          LogicalTableScan(table=[[HR, emps]]): rowcount = 100.0, cumulative cost = {100.0 rows, 101.0 cpu, 0.0 io}, id = 10
          LogicalTableScan(table=[[HR, depts]]): rowcount = 100.0, cumulative cost = {100.0 rows, 101.0 cpu, 0.0 io}, id = 11

Calculation

The cost is provided by the relational expression (relNode). Example with EnumerableHashJoin

@Override public RelOptCost computeSelfCost(RelOptPlanner planner,
      RelMetadataQuery mq) {
    double rowCount = mq.getRowCount(this);

    // Joins can be flipped, and for many algorithms, both versions are viable
    // and have the same cost. To make the results stable between versions of
    // the planner, make one of the versions slightly more expensive.
    switch (joinType) {
    case SEMI:
    case ANTI:
      // SEMI and ANTI join cannot be flipped
      break;
    case RIGHT:
      rowCount = RelMdUtil.addEpsilon(rowCount);
      break;
    default:
      if (RelNodes.COMPARATOR.compare(left, right) > 0) {
        rowCount = RelMdUtil.addEpsilon(rowCount);
      }
    }

    // Cheaper if the smaller number of rows is coming from the LHS.
    // Model this by adding L log L to the cost.
    final double rightRowCount = right.estimateRowCount(mq);
    final double leftRowCount = left.estimateRowCount(mq);
    if (Double.isInfinite(leftRowCount)) {
      rowCount = leftRowCount;
    } else {
      rowCount += Util.nLogN(leftRowCount);
    }
    if (Double.isInfinite(rightRowCount)) {
      rowCount = rightRowCount;
    } else {
      rowCount += rightRowCount;
    }
    if (isSemiJoin()) {
      return planner.getCostFactory().makeCost(rowCount, 0, 0).multiplyBy(.01d);
    } else {
      return planner.getCostFactory().makeCost(rowCount, 0, 0);
    }
  }

Documentation / Reference





Discover More
Card Puncher Data Processing
Calcite - Planner (RelOptPlanner)

org/apache/calcite/plan/package-infoPlan provides an optimizer interface (ie Defines interfaces for constructing rule-based optimizers of relational expressions) Frameworks...
Card Puncher Data Processing
Calcite - Relational Expression (RelNode, Algebra)

Relational Algebra in Calcite A relational expression is represented by a tree of RelNode. A RelNode can be considered as the same logic than the Spark dataframe. TableScan Project Filter...
Card Puncher Data Processing
Calcite - Statistics

Statistics are used to improve accuracy of cost estimations Calcite provides utilities for computing resource related statistics for use in cost



Share this page:
Follow us:
Task Runner