200410 Introduction to databases (lecture 11)
- 课件:lecture 11 Query Processing and Optimization
- Reference material Database System Concepts, Seventh Edition - Chapter 16
Query Compilation
- a) Parsing. A parse tree for the query is constructed.
- b) Query Rewrite. The parse tree is converted to an initial query plan, which is usually an algebraic representation of the query. This initial plan is then transformed into an equivalent plan that is expected to require less time to execute.
- c) Physical Plan Generation. The abstract query plan from b), often called a logical query plan, is turned into a physical query plan by selecting algorithms to implement each of the operators of the logical plan, and by selecting an order of execution for these operators. The physical plan, like the result of parsing and the logical plan, is represented by an expression tree. The physical plan also includes details such as how the queried relations are accessed, and when and if a relation should be sorted.
Grammar/Syntax Diagrams
BNF
- Grammar rules define valid SQL statements.
- Query parser
-Validates syntax.
-Produces a logical parse tree/plan. - The optimizer and execution engine execute the plan.
SQL Parse Tree
Simple Parsing Example
Query Plan/Logical Plan
Simple Execution/Plan Tree
Relational Operations
Logical Plan
Optimization/Execution Overview
- Two main issues in query optimization:
-For a given query, what plans are considered?
[Algorithm to search plan space for cheapest (estimated) plan.]
-How is the cost of a plan estimated? - Ideally: Want to find best plan. Practically: Avoid worst plans!
Three Core Optimization Techniques
Query Execution Cost
Statistics and Catalogs
- Need information about the relations and indexes involved. Catalogs typically contain at least:
-# tuples (NTuples) and # pages (NPages) for each relation.
-#distinct key values (NKeys) and NPages for each index.
-Index height, low/high key values (Low/High) for each tree index. - Catalogs updated periodically.
Updating whenever data changes is too expensive; lots of approximation anyway, so slight inconsistency ok. - More detailed information (e.g., histograms of the values in some field) are sometimes stored.
MySQL Example
The Catalog Contains
- For each table
-Table name, storage information.
-Attribute name and type for each column.
-Index name, type and indexed attributes
-Constraints - Statistical information
-Cardinality of each table
-Size: No. of blocks.
-Index cardinality: Number of distinct key values for each index
-Index size: Number of blocks for each index.
-Index tree height
-Index ranges
Access Path
- An access path is a method of retrieving tuples:
File scan, or index that matches a selection (in the query) - A tree index matches (a conjunction of) terms that involve only attributes in a prefix of the search key.
E.g., Tree index on <a, b, c> matches the selection a=5 AND b=3, and a=5 AND b>6, but not b=3. - A hash index matches (a conjunction of) terms that has a term attribute = value for every attribute in the search key of the index.
E.g., Hash index on <a, b, c> matches a=5 AND b=3 AND c=5; but it does not match b=3, or a=5 AND b=3, or a>5 AND b=3 AND c=5.
Query Rewrite
Join : Index Nested Loops
Join : Sort - Merge
Example
Materialization
Pipelining
- 课件:lecture 11 transactions and Recover
- Reference material Database System Concepts, Seventh Edition - Chapter 17
Transactions and Recover
Core Transaction Concept - ACID Properties
Atomic
A transaction is a logical unit of work that must be either entirely completed or entirely undone. (All writes happen or none of them happen)
Simplistic Approach
DBMS ACID Implementation
Write Ahead Logging
NOTE:
- force : force the block to the disk
- steal : If I need memory, I will write an uncommitted update to the disk because I need to use this for some other transaction
- No force requires you to do REDO processes during recovery; Steal requires you to do UNDO processes during recovery
- summary : I’m going to REDO all the updates that committed, but we’re not written to disk and I’m going to UNDO all the updates that were written to disk but not committed.
ARIES recovery involves three passes
- Analysis pass:
-Determine which transactions to undo
-Determine which pages were dirty (disk version not up to date) at time of
-RedoLSN: LSN from which redo should start - Redo pass:
-Repeats history, redoing all actions from RedoLSN(updated committed but not written changes to pages)
-RecLSN and PageLSNs are used to avoid redoing actions already reflected on page - Undo pass:
-Rolls back all incomplete transactions (with uncommitted pages written to disk).
-Transactions whose abort was complete earlier are not undone
Durability
Availability and Replication
NOTE:
-
active/passive
No transactions are going to region #2. All transactions run region #1, but when transactions commite on site 1, you ship the changes over the network. -
active/active
Both of them are processing transactions at the same time. Ship it’s logs and changes to each other.