https://www.linkedin.com/learning/instructors/dan-sullivan https://www.linkedin.com/learning/advanced-sql-for-application-development ! 2h7m https://www.linkedin.com/learning/advanced-sql-for-query-tuning-and-performance-optimization/ ! 1h44m !!! execution plans, types of indices, partitioning, materialized views, hints to query optimizer, parallel query execution https://www.linkedin.com/learning/advanced-sql-for-query-tuning-and-performance-optimization/reduce-query-reponse-time-with-query-tuning ??? https://www.linkedin.com/learning/advanced-sql-for-query-tuning-and-performance-optimization/scanning-tables-and-indexes types of indexes * b-tree (balanced trees), for equality and range queris * hash, for equality * bitmap, for set operations (inclusion) * special (geospatial, user-defined indexing strategies) https://www.linkedin.com/learning/advanced-sql-for-query-tuning-and-performance-optimization/joining-tables 3 way of joining tables * nested loop join (comare all rows in both tables to each other) loop through one table for each row, loop through the other table, at each step, compare keys * hash join (calculate hash value of key and join based on match value) compute hash values of key values in smaller table store in hash table, which has hash value and row attributes scan larger table; find rows from smaller hash table * sort merge join (sort both tables and then join rows while taking advantage of order) sort both tables compare rows like nested loop join, but ... stop when it is not possible to find a match later in the table because of the sort order scan the driving table only once https://www.linkedin.com/learning/advanced-sql-for-query-tuning-and-performance-optimization/partitioning-data parition key * it is common to base them on time global indexes ... all the partitions ... https://www.linkedin.com/learning/advanced-sql-for-query-tuning-and-performance-optimization/explain-and-analyze explain select * from staff; query plan (text) Seq Scan on stuff (cons=0.00..24.00 rows=10000 width=75) explain select * from staff; query plan (text) Seq Scan on stuff (cons=0.00..24.00 rows=10000 width=75) (actual time=0.018..0.158 r...) Planning Time: 0.361 ms Execution Time: 0.248 ms explain select last_name from staff; ... width=7 ... https://www.linkedin.com/learning/advanced-sql-for-query-tuning-and-performance-optimization/example-plan-selecting-with-a-where-clause explain select * from staff where salary > 75000 query plan (text) Seq Scan on stuff (cons=0.00..26.50 rows=715 width=75) Filter: (salary > 75000) explain analyze select * from staff where salary > 75000 query plan (text) Seq Scan on stuff (cons=0.00..26.50 rows=715 width=75) (actual time 0.077..0.611) Filter: (salary > 75000) Rows Removed by Filter: 283 Planning Time: 0.107 ms Execution Time: 0.960 ms https://www.linkedin.com/learning/advanced-sql-for-query-tuning-and-performance-optimization/indexes create index idx_staff_salary on staff(salary); explain select * from staff no usage of index explain analyze select * from staff where salary > 75000 again, index is not used ! why ??? because there are so many rows with salary > 75000 explain analyze select * from staff where salary > 150000 Index Scan using idx_staff_salary on staff (cost 0.28..8.29 rows 1 width 75) (actual ) Index Cond: (salary > 150000) Planning Time: 4.252 ms (reduces 2nd, 3rd, ... other time) Execution Time: 0.246 ms https://www.linkedin.com/learning/advanced-sql-for-query-tuning-and-performance-optimization/indexing types of indexes * b-tree * bitmap (on low-cardinality data) * hash (in a k-v form) * special https://www.linkedin.com/learning/advanced-sql-for-query-tuning-and-performance-optimization/b-tree-indexes https://www.linkedin.com/learning/advanced-sql-for-query-tuning-and-performance-optimization/b-tree-index-example-plan we have 3 tables: * company_divisions * company_regions * staff explain select * from staff where email = 'bphillips5@time.com' Seq Scan on staff (cost=0.00..26.50 rows=1 width=75) Filter: ((email)::text = 'bphillips5@time.com'::text) B-Tree is a default index type create index idx_staff_email on staff(email) explain select * from staff where email = 'bphillips5@time.com' Index Scan using idx_staff_email on staff (const=0.28..8.29 rows 1 width=75) Index Cond: ((email)::text = 'bphillips5@time.com'::text) https://www.linkedin.com/learning/advanced-sql-for-query-tuning-and-performance-optimization/bitmap-indexes we can perform boolean operations (and, or, not) quickly on bitmap indexes updating indexes can be more time-consuming than b-tree postgres creates them on the fly https://www.linkedin.com/learning/learn-apache-kafka-for-beginners/delivery-semantics-for-consumers select distinct job_tile from staf order by job_title; create index idx_staf_job_title on staf(job_title); explain select * from staf where job_title = 'Operator'; Bitmap Heap Scan on staf (cost=4.36..18.36 rows=11 width=75) Recheck Cond: ((job_title)::text = 'Operator'::text) Bitmap Index Scan on idx_staf_job_title (cost=0.00..4.36 rows=11 width=0) Index Cond: ((job_title)::text = 'Operator'::text) Bitmap indexes are created on the fly when PG thinks they can be useful https://www.linkedin.com/learning/advanced-sql-for-query-tuning-and-performance-optimization/hash-indexes Used only for equality operations (=), but not for range queries https://www.linkedin.com/learning/advanced-sql-for-query-tuning-and-performance-optimization/hash-index-example-plan create index idx_staff_email on staff USING HASH (email); https://www.linkedin.com/learning/advanced-sql-for-query-tuning-and-performance-optimization/hash-index-example-plan explain select * from staff where email = 'bphillips5@time.com' Index Scan using idx_staff_email on staff (cost=0.00..8.02 rows=...) Index Cond: ((email)::text = 'bphillips5@tim.com'::text) https://www.linkedin.com/learning/advanced-sql-for-query-tuning-and-performance-optimization/postgresql-specific-indexes? 4 special types of indexes GIST generalized search tree SP-GIST space-partitioned GIST (supports partitioned search trees, used for non-ballanced DSs) GIN used for text indexing lookup is faster than GIST but build time is slower size is 2-3 times bigger than GIST BRIN block range indexing used for large data sets divide data into ordered blocks keeps min and max values search only blocks... https://www.linkedin.com/learning/advanced-sql-for-query-tuning-and-performance-optimization/what-affects-joins-performance INNER JOIN LEFT OUTER JOIN RIGHT OUTER JOIN FULL OUTER JOIN inner join where from_table.some_field = other_table.some_other_field select * from company_region cr, inner join staff s on cr.region_id = s.region_id left outer join returns all rows from left table and rows from the rigth table that have matching key select * from company_region cr, left outer join staff s on cr.region_id = s.region_id right outer join returns all rows from right table and rows from the left table that have matching key select * from company_region cr, right outer join staff s on cr.region_id = s.region_id full outer join returns all rows from both tables nulls will be returned when there is no match select * from company_region cr, full outer join staff s on cr.region_id = s.region_id https://www.linkedin.com/learning/advanced-sql-for-query-tuning-and-performance-optimization/nested-loops nested loop joins * two loops for row in table 1 (called the "driver" table): for row in table 2 (called the "join" table): customer table - is a driver-table status table - is a join-table https://www.linkedin.com/learning/advanced-sql-for-query-tuning-and-performance-optimization/nested-loop-example-plan set enable_nestedloop=true; set enable_hashjoin=false; set enable_mergejoin=false explain select s.id, s_last_name, s.job_table, cr.country from staff s inner join company_region cr on s.region_id = cr.region_id Nested Loop (cost=0.15..239.37 rows=1000 width=88) -> Seq Scan on staff c (cost=0.00..24.00 rows=1000 width=34) -> Index Scan using company_regions_pkey on company_regions... Index Cond: (region_id = s.region_id) PG create index for all PK columns after delete company_regions_pkey Nested Loop (cost=0.15..8290.88 rows=1000 width=88) Join Filter: (s.region_id = cr.region_id) -> Seq Scan on staff c (cost=0.00..24.00 rows=1000 width=34) -> Materialize (const=0.00..24.00 rows=1000 width=34) Seq Scan on company_regions cr (cost=0.00..15.00 rows=5...) https://www.linkedin.com/learning/advanced-sql-for-query-tuning-and-performance-optimization/hash-joins Build Hash Table Use the samller of the two tables Compute the value of primary key value Store in table Probe phase Step through large table Compute hash value of primary or foreign key Lookup corresponding value in hash table https://www.linkedin.com/learning/advanced-sql-for-query-tuning-and-performance-optimization/hash-join-example-plan set enable_nestloop=false;