notes/db/sql/postgres/docs/courses/linkedin/dan-sullivan.txt
Ihar Hancharenka 5dff80e88e first
2023-03-27 16:52:17 +03:00

247 строки
11 KiB
Plaintext

https://www.linkedin.com/learning/instructors/dan-sullivan
https://www.linkedin.com/learning/advanced-sql-for-application-development
! 2h7m
https://www.linkedin.com/learning/advanced-sql-for-query-tuning-and-performance-optimization/
! 1h44m !!! execution plans, types of indices, partitioning, materialized views, hints to query optimizer, parallel query execution
https://www.linkedin.com/learning/advanced-sql-for-query-tuning-and-performance-optimization/reduce-query-reponse-time-with-query-tuning
???
https://www.linkedin.com/learning/advanced-sql-for-query-tuning-and-performance-optimization/scanning-tables-and-indexes
types of indexes
* b-tree (balanced trees), for equality and range queris
* hash, for equality
* bitmap, for set operations (inclusion)
* special (geospatial, user-defined indexing strategies)
https://www.linkedin.com/learning/advanced-sql-for-query-tuning-and-performance-optimization/joining-tables
3 way of joining tables
* nested loop join (comare all rows in both tables to each other)
loop through one table
for each row, loop through the other table,
at each step, compare keys
* hash join (calculate hash value of key and join based on match value)
compute hash values of key values in smaller table
store in hash table, which has hash value and row attributes
scan larger table; find rows from smaller hash table
* sort merge join (sort both tables and then join rows while taking advantage of order)
sort both tables
compare rows like nested loop join, but ...
stop when it is not possible to find a match later in the table because of the sort order
scan the driving table only once
https://www.linkedin.com/learning/advanced-sql-for-query-tuning-and-performance-optimization/partitioning-data
parition key
* it is common to base them on time
global indexes
... all the partitions ...
https://www.linkedin.com/learning/advanced-sql-for-query-tuning-and-performance-optimization/explain-and-analyze
explain select * from staff;
query plan (text)
Seq Scan on stuff (cons=0.00..24.00 rows=10000 width=75)
explain select * from staff;
query plan (text)
Seq Scan on stuff (cons=0.00..24.00 rows=10000 width=75) (actual time=0.018..0.158 r...)
Planning Time: 0.361 ms
Execution Time: 0.248 ms
explain select last_name from staff;
... width=7 ...
https://www.linkedin.com/learning/advanced-sql-for-query-tuning-and-performance-optimization/example-plan-selecting-with-a-where-clause
explain select * from staff where salary > 75000
query plan (text)
Seq Scan on stuff (cons=0.00..26.50 rows=715 width=75)
Filter: (salary > 75000)
explain analyze select * from staff where salary > 75000
query plan (text)
Seq Scan on stuff (cons=0.00..26.50 rows=715 width=75) (actual time 0.077..0.611)
Filter: (salary > 75000)
Rows Removed by Filter: 283
Planning Time: 0.107 ms
Execution Time: 0.960 ms
https://www.linkedin.com/learning/advanced-sql-for-query-tuning-and-performance-optimization/indexes
create index idx_staff_salary on staff(salary);
explain select * from staff
no usage of index
explain analyze select * from staff where salary > 75000
again, index is not used !
why ??? because there are so many rows with salary > 75000
explain analyze select * from staff where salary > 150000
Index Scan using idx_staff_salary on staff (cost 0.28..8.29 rows 1 width 75) (actual )
Index Cond: (salary > 150000)
Planning Time: 4.252 ms (reduces 2nd, 3rd, ... other time)
Execution Time: 0.246 ms
https://www.linkedin.com/learning/advanced-sql-for-query-tuning-and-performance-optimization/indexing
types of indexes
* b-tree
* bitmap (on low-cardinality data)
* hash (in a k-v form)
* special
https://www.linkedin.com/learning/advanced-sql-for-query-tuning-and-performance-optimization/b-tree-indexes
https://www.linkedin.com/learning/advanced-sql-for-query-tuning-and-performance-optimization/b-tree-index-example-plan
we have 3 tables:
* company_divisions
* company_regions
* staff
explain select * from staff where email = 'bphillips5@time.com'
Seq Scan on staff (cost=0.00..26.50 rows=1 width=75)
Filter: ((email)::text = 'bphillips5@time.com'::text)
B-Tree is a default index type
create index idx_staff_email on staff(email)
explain select * from staff where email = 'bphillips5@time.com'
Index Scan using idx_staff_email on staff (const=0.28..8.29 rows 1 width=75)
Index Cond: ((email)::text = 'bphillips5@time.com'::text)
https://www.linkedin.com/learning/advanced-sql-for-query-tuning-and-performance-optimization/bitmap-indexes
we can perform boolean operations (and, or, not) quickly on bitmap indexes
updating indexes can be more time-consuming than b-tree
postgres creates them on the fly
https://www.linkedin.com/learning/learn-apache-kafka-for-beginners/delivery-semantics-for-consumers
select distinct job_tile from staf order by job_title;
create index idx_staf_job_title on staf(job_title);
explain select * from staf where job_title = 'Operator';
Bitmap Heap Scan on staf (cost=4.36..18.36 rows=11 width=75)
Recheck Cond: ((job_title)::text = 'Operator'::text)
Bitmap Index Scan on idx_staf_job_title (cost=0.00..4.36 rows=11 width=0)
Index Cond: ((job_title)::text = 'Operator'::text)
Bitmap indexes are created on the fly when PG thinks they can be useful
https://www.linkedin.com/learning/advanced-sql-for-query-tuning-and-performance-optimization/hash-indexes
Used only for equality operations (=), but not for range queries
https://www.linkedin.com/learning/advanced-sql-for-query-tuning-and-performance-optimization/hash-index-example-plan
create index idx_staff_email on staff USING HASH (email);
https://www.linkedin.com/learning/advanced-sql-for-query-tuning-and-performance-optimization/hash-index-example-plan
explain select * from staff where email = 'bphillips5@time.com'
Index Scan using idx_staff_email on staff (cost=0.00..8.02 rows=...)
Index Cond: ((email)::text = 'bphillips5@tim.com'::text)
https://www.linkedin.com/learning/advanced-sql-for-query-tuning-and-performance-optimization/postgresql-specific-indexes?
4 special types of indexes
GIST
generalized search tree
SP-GIST
space-partitioned GIST (supports partitioned search trees, used for non-ballanced DSs)
GIN
used for text indexing
lookup is faster than GIST
but build time is slower
size is 2-3 times bigger than GIST
BRIN
block range indexing
used for large data sets
divide data into ordered blocks
keeps min and max values
search only blocks...
https://www.linkedin.com/learning/advanced-sql-for-query-tuning-and-performance-optimization/what-affects-joins-performance
INNER JOIN
LEFT OUTER JOIN
RIGHT OUTER JOIN
FULL OUTER JOIN
inner join
where
from_table.some_field = other_table.some_other_field
select * from company_region cr,
inner join
staff s
on
cr.region_id = s.region_id
left outer join
returns all rows from left table
and rows from the rigth table
that have matching key
select * from company_region cr,
left outer join
staff s
on
cr.region_id = s.region_id
right outer join
returns all rows from right table
and rows from the left table
that have matching key
select * from company_region cr,
right outer join
staff s
on
cr.region_id = s.region_id
full outer join
returns all rows from both tables
nulls will be returned when there is no match
select * from company_region cr,
full outer join
staff s
on
cr.region_id = s.region_id
https://www.linkedin.com/learning/advanced-sql-for-query-tuning-and-performance-optimization/nested-loops
nested loop joins
* two loops
for row in table 1 (called the "driver" table):
for row in table 2 (called the "join" table):
customer table - is a driver-table
status table - is a join-table
https://www.linkedin.com/learning/advanced-sql-for-query-tuning-and-performance-optimization/nested-loop-example-plan
set enable_nestedloop=true;
set enable_hashjoin=false;
set enable_mergejoin=false
explain select
s.id, s_last_name, s.job_table, cr.country
from
staff s
inner join
company_region cr
on
s.region_id = cr.region_id
Nested Loop (cost=0.15..239.37 rows=1000 width=88)
-> Seq Scan on staff c (cost=0.00..24.00 rows=1000 width=34)
-> Index Scan using company_regions_pkey on company_regions...
Index Cond: (region_id = s.region_id)
PG create index for all PK columns
after
delete company_regions_pkey
Nested Loop (cost=0.15..8290.88 rows=1000 width=88)
Join Filter: (s.region_id = cr.region_id)
-> Seq Scan on staff c (cost=0.00..24.00 rows=1000 width=34)
-> Materialize (const=0.00..24.00 rows=1000 width=34)
Seq Scan on company_regions cr (cost=0.00..15.00 rows=5...)
https://www.linkedin.com/learning/advanced-sql-for-query-tuning-and-performance-optimization/hash-joins
Build Hash Table
Use the samller of the two tables
Compute the value of primary key value
Store in table
Probe phase
Step through large table
Compute hash value of primary or foreign key
Lookup corresponding value in hash table
https://www.linkedin.com/learning/advanced-sql-for-query-tuning-and-performance-optimization/hash-join-example-plan
set enable_nestloop=false;