Left Arrow.Back

SQL - Performance

3 rectangles stacked on top of 1 another behind a tachometer.

Notes of SQL performance & optimization.

Last Tended-
PlantedNov 2021
StatusSeed
TagsData
The different stages of a xenomoprh life cycle

Query Life Cycle

A flowchart.

Parse

When a query is entered, it is 1st passed to the parser. It checks for:

  • syntax errors (e.g. a comma out of place)
  • semantic errors (e.g. a table doesn’t exist)

If the checks fail, the error is returned. If the checks pass, the query is turned into an algebraic expression & passed on.

Optimize

The engine:

  1. Performs simple optimizations (e.g. simplifying 5*10 into 50).
  2. Considers different ‘query plans’, estimating CPU usage & duration for each plan based on number of rows in the query’s specified table(s).
  3. Picks an optimal plan & passes it on.

Execute

The engine turns the plan into operations for the database & returns the result.

Source
Blueprint

Query Plan

When you start dealing with bigger data sets, you'll start to care more about how quickly your queries execute. SQL is a declarative language - each query declares what we want the SQL engine to do, not how. The how, known as the query plan, is 1 thing that affects the efficiency of a query. Imagine we had the following data:

schema.sql

output

schema.sql

fleet

idnameclasslength
1a-wingfighter7
2cruisercorvette127
3x-wingfighter13
4y-wingbomber23

We want to find all the starships who class = "fighter". SQL could find the result using 2 different plans; Full table scan or Create an index.

Full Table Scan

SQL will look at every single row in the fleet table & return the rows where class = "fighter". This would be fast if the table only had 4 rows. If the table has 1 million rows however, then it would require looking at 1 million rows, which could be slow.

query.sql

output

query.sql

fleet

idnameclasslength
1a-wingfighter7
2cruisercorvette127
3x-wingfighter13
4y-wingbomber23

Create an Index

Creating an index of a table involves picking a column(s) that you want to sort your table by. SQL will then generate a new sorted table, made up of the column(s) you selected & the id column (to link each row back to the original table).

schema.sql

output

schema.sql

fleet_class_idx

classid
bomber4
corvette2
fighter1
fighter3

Now when you do the query:

SELECT *
FROM fleet
WHERE class = "fighter";

SQL will:

  1. do a binary search on the index table to find the rows with class = "fighter",
  2. get the IDs of these rows &
  3. for each ID, do another binary search on the original table, returning the row that matches.

If the table contains 1 millions rows, creating the index would require ~60 million operations (depending on the engine) which could table some time. Each proceeding binary search on the original table however, would only require log(1 000 000) = 6 lookups to find a value.

Source
Science tubes with alien specimens

Query Tuning

The process of optimizing queries.

  1. Identify what query(s) you want to tune by looking at which of your database calls are taking the longest / using the most resources using tools like SQL Profiler.
  2. Understand how the SQL engine is executing the query. All SQL systems come with a way to ask the engine (e.g. In SQLite, put EXPLAIN QUERY PLAN in front of a query to see the query plan).
  3. Change your query or data structures based on the SQL engine you're using & your data.

Example

Imagine our fleet table had a significant number of rows, users were regularly querying that table to find all rows with a specific class & it was taking a significant amount of time. We could create an index based on the class column in our schema. This would take some time to be created but now all the queries would be faster.

Source
A space ship's docking bay with various equipment.

Tools