200410 Introduction to databases (lecture 11)

  • 课件:lecture 11 Query Processing and Optimization
  • Reference material Database System Concepts, Seventh Edition - Chapter 16

200410 Introduction to databases (lecture 11)

Query Compilation

  • a) Parsing. A parse tree for the query is constructed.
  • b) Query Rewrite. The parse tree is converted to an initial query plan, which is usually an algebraic representation of the query. This initial plan is then transformed into an equivalent plan that is expected to require less time to execute.
  • c) Physical Plan Generation. The abstract query plan from b), often called a logical query plan, is turned into a physical query plan by selecting algorithms to implement each of the operators of the logical plan, and by selecting an order of execution for these operators. The physical plan, like the result of parsing and the logical plan, is represented by an expression tree. The physical plan also includes details such as how the queried relations are accessed, and when and if a relation should be sorted.

200410 Introduction to databases (lecture 11)

Grammar/Syntax Diagrams

200410 Introduction to databases (lecture 11)

BNF

描述编程语言的BNF

  • Grammar rules define valid SQL statements.
  • Query parser
    -Validates syntax.
    -Produces a logical parse tree/plan.
  • The optimizer and execution engine execute the plan.

SQL Parse Tree

200410 Introduction to databases (lecture 11)

Simple Parsing Example

200410 Introduction to databases (lecture 11)

Query Plan/Logical Plan

Simple Execution/Plan Tree

200410 Introduction to databases (lecture 11)

Relational Operations

200410 Introduction to databases (lecture 11)

Logical Plan

200410 Introduction to databases (lecture 11)

Optimization/Execution Overview

  • Two main issues in query optimization:
    -For a given query, what plans are considered?
    [Algorithm to search plan space for cheapest (estimated) plan.]
    -How is the cost of a plan estimated?
  • Ideally: Want to find best plan. Practically: Avoid worst plans!

Three Core Optimization Techniques

200410 Introduction to databases (lecture 11)

Query Execution Cost

200410 Introduction to databases (lecture 11)

Statistics and Catalogs

  • Need information about the relations and indexes involved. Catalogs typically contain at least:
    -# tuples (NTuples) and # pages (NPages) for each relation.
    -#distinct key values (NKeys) and NPages for each index.
    -Index height, low/high key values (Low/High) for each tree index.
  • Catalogs updated periodically.
    Updating whenever data changes is too expensive; lots of approximation anyway, so slight inconsistency ok.
  • More detailed information (e.g., histograms of the values in some field) are sometimes stored.

MySQL Example

200410 Introduction to databases (lecture 11)

The Catalog Contains

  • For each table
    -Table name, storage information.
    -Attribute name and type for each column.
    -Index name, type and indexed attributes
    -Constraints
  • Statistical information
    -Cardinality of each table
    -Size: No. of blocks.
    -Index cardinality: Number of distinct key values for each index
    -Index size: Number of blocks for each index.
    -Index tree height
    -Index ranges

Access Path

200410 Introduction to databases (lecture 11)

  • An access path is a method of retrieving tuples:
    File scan, or index that matches a selection (in the query)
  • A tree index matches (a conjunction of) terms that involve only attributes in a prefix of the search key.
    E.g., Tree index on <a, b, c> matches the selection a=5 AND b=3, and a=5 AND b>6, but not b=3.
  • A hash index matches (a conjunction of) terms that has a term attribute = value for every attribute in the search key of the index.
    E.g., Hash index on <a, b, c> matches a=5 AND b=3 AND c=5; but it does not match b=3, or a=5 AND b=3, or a>5 AND b=3 AND c=5.

Query Rewrite

Join : Index Nested Loops

200410 Introduction to databases (lecture 11)

Join : Sort - Merge

200410 Introduction to databases (lecture 11)

Example

200410 Introduction to databases (lecture 11)

Materialization

200410 Introduction to databases (lecture 11)

Pipelining

200410 Introduction to databases (lecture 11)


  • 课件:lecture 11 transactions and Recover
  • Reference material Database System Concepts, Seventh Edition - Chapter 17

Transactions and Recover

Core Transaction Concept - ACID Properties

200410 Introduction to databases (lecture 11)

Atomic

A transaction is a logical unit of work that must be either entirely completed or entirely undone. (All writes happen or none of them happen)

Simplistic Approach

200410 Introduction to databases (lecture 11)
200410 Introduction to databases (lecture 11)

DBMS ACID Implementation

200410 Introduction to databases (lecture 11)

Write Ahead Logging

200410 Introduction to databases (lecture 11)

200410 Introduction to databases (lecture 11)

NOTE:

  • force : force the block to the disk
  • steal : If I need memory, I will write an uncommitted update to the disk because I need to use this for some other transaction
  • No force requires you to do REDO processes during recovery; Steal requires you to do UNDO processes during recovery
  • summary : I’m going to REDO all the updates that committed, but we’re not written to disk and I’m going to UNDO all the updates that were written to disk but not committed.

ARIES recovery involves three passes

  • Analysis pass:
    -Determine which transactions to undo
    -Determine which pages were dirty (disk version not up to date) at time of
    -RedoLSN: LSN from which redo should start
  • Redo pass:
    -Repeats history, redoing all actions from RedoLSN(updated committed but not written changes to pages)
    -RecLSN and PageLSNs are used to avoid redoing actions already reflected on page
  • Undo pass:
    -Rolls back all incomplete transactions (with uncommitted pages written to disk).
    -Transactions whose abort was complete earlier are not undone

Durability

200410 Introduction to databases (lecture 11)

Availability and Replication

200410 Introduction to databases (lecture 11)

NOTE:

  • active/passive
    No transactions are going to region #2. All transactions run region #1, but when transactions commite on site 1, you ship the changes over the network.

  • active/active
    Both of them are processing transactions at the same time. Ship it’s logs and changes to each other.

Isolation

Locking and Concurrency

200410 Introduction to databases (lecture 11)

Locking

200410 Introduction to databases (lecture 11)

Schedule

200410 Introduction to databases (lecture 11)

Serializable

200410 Introduction to databases (lecture 11)

Lock-Based Concurrency Control

200410 Introduction to databases (lecture 11)

Aborting a Transaction

200410 Introduction to databases (lecture 11)

MySQL (Locking) Isolation

200410 Introduction to databases (lecture 11)