Notes of SQL performance & optimization.
Query Life Cycle
The user enters a query and passes it to the parser.
The parser checks the query for:
- syntax errors (e.g. a comma out of place)
- semantic errors (e.g. a table doesn’t exist)
If the checks fail, the error is returned. If the checks pass, the query is turned into an algebraic expression & passed on.
- Performs simple optimizations (e.g. simplifying 5*10 into 50).
- Considers different ‘query plans’, estimating CPU usage & duration for each plan based on number of rows in the query’s specified table(s).
- Picks an optimal plan & passes it on.
The engine turns the plan into operations for the database.
The result is returned to the user.
When you start dealing with bigger data sets, you'll start to care more about how quickly your queries execute. SQL is a declarative language - each query declares what we want the SQL engine to do, not how. The how, known as the query plan, is 1 thing that affects the efficiency of a query. Imagine we had the following data:
We want to find all the starships who
class = 'fighter'. SQL could find the result using 2 different plans; Full table scan or Create an index.
Full Table Scan
SQL will look at every single row in the fleet table & return the rows where class = "fighter". This would be fast if the table only had 4 rows. If the table has 1 million rows however, then it would require looking at 1 million rows, which could be slow.
Create an Index
Creating an index of a table involves picking a column(s) that you want to sort your table by. SQL will then generate a new sorted table, made up of the column(s) you selected & the id column (to link each row back to the original table).
Now when you do the query:
- do a binary search on the index table to find the rows with class = "fighter",
- get the IDs of these rows &
- for each ID, do another binary search on the original table, returning the row that matches.
If the table contains 1 millions rows, creating the index would require ~60 million operations (depending on the engine) which could table some time.
Each proceeding binary search on the original table however, would only require
log(1 000 000) = 6 lookups to find a value.
The process of optimizing queries.
- Identify what query(s) you want to tune by looking at which of your database calls are taking the longest / using the most resources using tools like SQL Profiler.
- Understand how the SQL engine is executing the query. All SQL systems come with a way to ask the engine (e.g. In SQLite, put
EXPLAIN QUERY PLANin front of a query to see the query plan).
- Change your query or data structures based on the SQL engine you're using & your data.
fleet table had a significant number of rows, users were regularly querying that table to find all rows with a specific class & it was taking a significant amount of time.
We could create an index based on the
class column in our schema.
This would take some time to be created but now all the queries would be faster.