I always do my query work on PostgreSQL as it is the most strictjmpoure wrote:I am in charge of handling the various databases that phpBB is supported on.
SQL optimisation
Forum rules
Discussion of general topics related to the new release and its place in the world. Don't discuss new features, report bugs, ask for support, et cetera. Don't use this to spam for other boards or attack those boards!
Discussion of general topics related to the new release and its place in the world. Don't discuss new features, report bugs, ask for support, et cetera. Don't use this to spam for other boards or attack those boards!
Re: SQL optimisation
Freedom from fear
Re: SQL optimisation
Look at that. We discussed about this Zebra thing before:
Query plan:
The correct SQL query would be:
Now run EXPLAIN ANALYSE:
At least 100 millisecond faster.
LEFT JOINS are always faster than inner joins.
Code: Select all
SELECT u.*, z.friend, z.foe, p.* FROM phpbb_users u, phpbb_posts p LEFT JOIN phpbb_zebra z ON (z.user_id = 1 AND z.zebra_id = p.poster_id) WHERE p.post_id IN ('421984', '421983', '421982', '421981', '421980', '421979', '421978', '421977', '421976', '421975', '421974', '421973', '421971', '421970', '421969', '421968', '421967', '421966', '421965', '421964', '421963', '421962', '421961', '421960', '421958', '421957', '421956', '421955', '421954', '421953', '421952', '421951', '421950', '421949', '421948', '421947', '421946', '421944', '421943', '421942', '421941', '421939', '421938', '421937', '421936', '421935', '421934', '421932', '421931', '421930')
AND u.user_id = p.poster_id
2008-01-07 23:25:27 CET LOG: durée : 138.546 ms
Code: Select all
QUERY PLAN
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Hash Left Join (cost=210.60..742.86 rows=50 width=1050) (actual time=0.265..0.990 rows=50 loops=1)
Hash Cond: (p.poster_id = z.zebra_id)
-> Nested Loop (cost=209.00..741.06 rows=50 width=1046) (actual time=0.202..0.883 rows=50 loops=1)
-> Bitmap Heap Scan on phpbb_posts p (cost=209.00..406.99 rows=50 width=642) (actual time=0.169..0.386 rows=50 loops=1)
Recheck Cond: (post_id = ANY ('{421984,421983,421982,421981,421980,421979,421978,421977,421976,421975,421974,421973,421971,421970,421969,421968,421967,421966,421965,421964,421963,421962,421961,421960,421958,421957,421956,421955,421954,421953,421952,421951,421950,421949,421948,421947,421946,421944,421943,421942,421941,421939,421938,421937,421936,421935,421934,421932,421931,421930}'::integer[]))
-> Bitmap Index Scan on phpbb_posts_pkey (cost=0.00..208.98 rows=50 width=0) (actual time=0.153..0.153 rows=50 loops=1)
Index Cond: (post_id = ANY ('{421984,421983,421982,421981,421980,421979,421978,421977,421976,421975,421974,421973,421971,421970,421969,421968,421967,421966,421965,421964,421963,421962,421961,421960,421958,421957,421956,421955,421954,421953,421952,421951,421950,421949,421948,421947,421946,421944,421943,421942,421941,421939,421938,421937,421936,421935,421934,421932,421931,421930}'::integer[]))
-> Index Scan using phpbb_users_pkey on phpbb_users u (cost=0.00..6.67 rows=1 width=404) (actual time=0.007..0.007 rows=1 loops=50)
Index Cond: (u.user_id = p.poster_id)
-> Hash (cost=1.59..1.59 rows=1 width=8) (actual time=0.014..0.014 rows=0 loops=1)
-> Seq Scan on phpbb_zebra z (cost=0.00..1.59 rows=1 width=8) (actual time=0.013..0.013 rows=0 loops=1)
Filter: (user_id = 1)
Total runtime: 1.216 ms
(13 lignes)
Code: Select all
SELECT u.*, z.friend, z.foe, p.*
FROM phpbb_posts p
LEFT JOIN phpbb_zebra z ON (z.user_id = 1 AND z.zebra_id = p.poster_id)
LEFT JOIN phpbb_users u ON u.user_id = p.poster_id
WHERE p.post_id IN ('421984', '421983', '421982', '421981', '421980', '421979', '421978', '421977', '421976', '421975', '421974', '421973', '421971', '421970', '421969', '421968', '421967', '421966', '421965', '421964', '421963', '421962', '421961', '421960', '421958', '421957', '421956', '421955', '421954', '421953', '421952', '421951', '421950', '421949', '421948', '421947', '421946', '421944', '421943', '421942', '421941', '421939', '421938', '421937', '421936', '421935', '421934', '421932', '421931', '421930')
Code: Select all
QUERY PLAN
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Nested Loop Left Join (cost=210.60..742.86 rows=50 width=1050) (actual time=0.156..0.537 rows=50 loops=1)
-> Hash Left Join (cost=210.60..408.79 rows=50 width=646) (actual time=0.146..0.251 rows=50 loops=1)
Hash Cond: (p.poster_id = z.zebra_id)
-> Bitmap Heap Scan on phpbb_posts p (cost=209.00..406.99 rows=50 width=642) (actual time=0.126..0.183 rows=50 loops=1)
Recheck Cond: (post_id = ANY ('{421984,421983,421982,421981,421980,421979,421978,421977,421976,421975,421974,421973,421971,421970,421969,421968,421967,421966,421965,421964,421963,421962,421961,421960,421958,421957,421956,421955,421954,421953,421952,421951,421950,421949,421948,421947,421946,421944,421943,421942,421941,421939,421938,421937,421936,421935,421934,421932,421931,421930}'::integer[]))
-> Bitmap Index Scan on phpbb_posts_pkey (cost=0.00..208.98 rows=50 width=0) (actual time=0.119..0.119 rows=50 loops=1)
Index Cond: (post_id = ANY ('{421984,421983,421982,421981,421980,421979,421978,421977,421976,421975,421974,421973,421971,421970,421969,421968,421967,421966,421965,421964,421963,421962,421961,421960,421958,421957,421956,421955,421954,421953,421952,421951,421950,421949,421948,421947,421946,421944,421943,421942,421941,421939,421938,421937,421936,421935,421934,421932,421931,421930}'::integer[]))
-> Hash (cost=1.59..1.59 rows=1 width=8) (actual time=0.011..0.011 rows=0 loops=1)
-> Seq Scan on phpbb_zebra z (cost=0.00..1.59 rows=1 width=8) (actual time=0.011..0.011 rows=0 loops=1)
Filter: (user_id = 1)
-> Index Scan using phpbb_users_pkey on phpbb_users u (cost=0.00..6.67 rows=1 width=404) (actual time=0.003..0.003 rows=1 loops=50)
Index Cond: (u.user_id = p.poster_id)
Total runtime: 0.642 ms
(13 lignes)
LEFT JOINS are always faster than inner joins.
Re: SQL optimisation
Cool.DavidMJ wrote:I always do my query work on PostgreSQL as it is the most strictjmpoure wrote:I am in charge of handling the various databases that phpBB is supported on.
Re: SQL optimisation
Feel free to choose your prefered optimisations, and drop the others.
When you are finished with modifications, I am interested in creating a set of 5 to 10 materialized view in PL.
We could be using http://www.jonathangardner.net/PostgreS ... views.html
Together with tseach2, it could drive all queries, including LEFT JOINS below 2 or 3 milliseconds.
Then a simple bi-core computer could host 1000 users ... if SQL caching does not slow the server.
When you are finished with modifications, I am interested in creating a set of 5 to 10 materialized view in PL.
We could be using http://www.jonathangardner.net/PostgreS ... views.html
Together with tseach2, it could drive all queries, including LEFT JOINS below 2 or 3 milliseconds.
Then a simple bi-core computer could host 1000 users ... if SQL caching does not slow the server.
Re: SQL optimisation
According to that query plan, it is using none of your new indexes.jmpoure wrote:That one is quite slow:Indexes are missing:Code: Select all
2008-01-04 19:31:15 CET LOG: instruction : SELECT g.group_id, g.group_name, g.group_colour, g.group_type FROM phpbb_groups g LEFT JOIN phpbb_user_group ug ON ( g.group_id = ug.group_id AND ug.user_id = 1 AND ug.user_pending = 0 ) WHERE g.group_legend = 1 AND (g.group_type <> 2 OR ug.user_id = 1) ORDER BY g.group_name ASC 2008-01-04 19:31:15 CET LOG: durée : 7.861 ms
There should be an increase.Code: Select all
CREATE INDEX phpbb_user_group_user_pending in a LEFT JOIN ON phpbb_user_group USING btree (user_pending); CREATE INDEX phpbb_groups_group_type ON phpbb_groups USING btree (group_type); CREATE INDEX phpbb_groups_group_name ON phpbb_groups USING btree (group_name);
Let's display the query plan:Code: Select all
EXPLAIN ANALYSE SELECT g.group_id, g.group_name, g.group_colour, g.group_type FROM phpbb_groups g LEFT JOIN phpbb_user_group ug ON ( g.group_id = ug.group_id AND ug.user_id = 1 AND ug.user_pending = 0 ) WHERE g.group_legend = 1 AND (g.group_type <> 2 OR ug.user_id = 1) ORDER BY g.group_name ASC;
Should be faster, thanks to user_pending index.Code: Select all
QUERY PLAN ---------------------------------------------------------------------------------------------------------------------------------------------------------------- Sort (cost=9.43..9.43 rows=2 width=28) (actual time=0.105..0.106 rows=2 loops=1) Sort Key: g.group_name -> Hash Left Join (cost=8.28..9.42 rows=2 width=28) (actual time=0.048..0.051 rows=2 loops=1) Hash Cond: (g.group_id = ug.group_id) Filter: ((g.group_type <> 2) OR (ug.user_id = 1)) -> Seq Scan on phpbb_groups g (cost=0.00..1.11 rows=2 width=28) (actual time=0.009..0.011 rows=2 loops=1) Filter: (group_legend = 1) -> Hash (cost=8.27..8.27 rows=1 width=8) (actual time=0.018..0.018 rows=1 loops=1) -> Index Scan using phpbb_user_group_user_id on phpbb_user_group ug (cost=0.00..8.27 rows=1 width=8) (actual time=0.015..0.015 rows=1 loops=1) Index Cond: (user_id = 1) Filter: (user_pending = 0) Total runtime: 0.142 ms (12 lignes)
Freedom from fear
Re: SQL optimisation
The differences in those query plans come down to differences in the sequential scan, which largely has nothing to do with the query itself...jmpoure wrote:This is too slow:
Quey plan :Code: Select all
2008-01-04 21:53:58 CET LOG: instruction : SELECT u.username, u.username_clean, u.user_id, u.user_type, u.user_allow_viewonline, u.user_colour, s.session_ip, s.session_viewonline FROM phpbb_users u, phpbb_sessions s WHERE s.session_time >= 1199479738 AND u.user_id = s.session_user_id ORDER BY u.username_clean ASC, s.session_ip ASC 2008-01-04 21:53:58 CET LOG: durée : 8.137 ms
In fact, this query is a LEFT JOIN.Code: Select all
QUERY PLAN ---------------------------------------------------------------------------------------------------------------------------------- Sort (cost=712.17..713.04 rows=347 width=52) (actual time=16.027..16.133 rows=551 loops=1) Sort Key: u.username_clean, s.session_ip -> Hash Join (cost=548.12..697.53 rows=347 width=52) (actual time=8.597..10.044 rows=551 loops=1) Hash Cond: (s.session_user_id = u.user_id) -> Seq Scan on phpbb_sessions s (cost=0.00..143.34 rows=347 width=22) (actual time=0.008..1.006 rows=551 loops=1) Filter: (session_time >= 1199479738) -> Hash (cost=493.61..493.61 rows=4361 width=34) (actual time=8.578..8.578 rows=4363 loops=1) -> Seq Scan on phpbb_users u (cost=0.00..493.61 rows=4361 width=34) (actual time=0.007..6.355 rows=4363 loops=1) Total runtime: 16.303 ms (9 lignes)
A left join is always fast than an inner join.
It should be written:Query plan :Code: Select all
SELECT u.username, u.username_clean, u.user_id, u.user_type, u.user_allow_viewonline, u.user_colour, s.session_ip, s.session_viewonline FROM phpbb_sessions s LEFT JOIN phpbb_users u ON u.user_id = s.session_user_id WHERE s.session_time >= 1199479738 ORDER BY u.username_clean ASC, s.session_ip ASC
Gain: 20% time at least ...Code: Select all
QUERY PLAN ---------------------------------------------------------------------------------------------------------------------------------- Sort (cost=712.17..713.04 rows=347 width=52) (actual time=14.096..14.202 rows=558 loops=1) Sort Key: u.username_clean, s.session_ip -> Hash Left Join (cost=548.12..697.53 rows=347 width=52) (actual time=7.102..8.086 rows=558 loops=1) Hash Cond: (s.session_user_id = u.user_id) -> Seq Scan on phpbb_sessions s (cost=0.00..143.34 rows=347 width=22) (actual time=0.006..0.544 rows=558 loops=1) Filter: (session_time >= 1199479738) -> Hash (cost=493.61..493.61 rows=4361 width=34) (actual time=7.084..7.084 rows=4363 loops=1) -> Seq Scan on phpbb_users u (cost=0.00..493.61 rows=4361 width=34) (actual time=0.005..4.863 rows=4363 loops=1) Total runtime: 14.336 ms (9 lignes)
Freedom from fear
Re: SQL optimisation
I do not agree. During an INNER JOIN, the parser will consider LEFT and RIGHT possibilities. On the converse, a LEFT JOIN is always faster, because the query planner and optimiser knows the main table (clause FROM). LEFT JOINS are the fastest joins. Whenever you can replace an INNER JOIN with a LEFT JOIN, you will gain a minimum of 20%. You can clearly see this in the result plan. The LEFT JOIN is 20% faster: 14 microseconds instead of 16 microseconds. But the gab would be larger on deeper queries or larger tables.DavidMJ wrote:The differences in those query plans come down to differences in the sequential scan, which largely has nothing to do with the query itself...
For speed, the next step would be to use MARERIALIZED VIEWS. In this case,
Code: Select all
SELECT u.username, u.username_clean, u.user_id, u.user_type, u.user_allow_viewonline, u.user_colour, s.session_ip, s.session_viewonline
FROM phpbb_sessions s
LEFT JOIN phpbb_users u ON u.user_id = s.session_user_id
WHERE s.session_time >= 1199479738
ORDER BY u.username_clean ASC, s.session_ip ASC
Last edited by jmpoure on Tue Jan 08, 2008 7:56 am, edited 4 times in total.
Re: SQL optimisation
In this case, it looks like the query planner chose the same join order on the tables...
Freedom from fear
Re: SQL optimisation
There is a real difference: the first query does a simple JOIN (Hash Join (cost=548.12..697.53 rows=347 width=52) (actual time=8.597..10.044 rows=551 loops=1)), the second a LEFT JOIN (Hash Left Join (cost=548.12..697.53 rows=347 width=52) (actual time=7.102..8.086 rows=558 loops=1)).DavidMJ wrote:In this case, it looks like the query planner chose the same join order on the tables...
When possible, every simple JOIN should be replaced with the equivalent LEFT JOIN. 20% or more speed/CPU usage/memory footprint matters on my server which handles from 20 to 50 queries every second. In the case of large installations or hosting services, this really matters.
Again, real optimisation would be reducing this query to 1 millisecond or less using materialized views.
Re: SQL optimisation
To summarise my opinion:
* tsearch2 should come as a standard choice for users. When tsearch2 is present, use it.
* INNER JOINS or JOINS should be replaced with LEFT JOINS whenever possible.
* Some tables carry indexes on 2 or more columns. These kind of indexes cannot be used in JOINS. Therefore, indexes on a single column are the preferred solution.
* Logging all PostgreSQL queries is usefull. A short query should run in 1-3 millisecond (or even less). A deep query in less than 10 millisecond ortherwise, we should use Materielized VIEWS. I am running my production server with query logging. I will try to anonymise the logs and publish them.
* PhpBB 3.0 deep queries (such as paging) should be hard-wired using Materialized VIEWS in PL. In this case, main queries could go very deep to fetch all needed data in one run. This could be important for 5 to 10 queries and would make PhpBB the fastest solution for internet hosting services.
* This should reduce the need for data caching and allow to reduce PhpBB memory footprint.
Without joking, I think it is possible to multiply speed by a factor or 3 or more on large installations.
* tsearch2 should come as a standard choice for users. When tsearch2 is present, use it.
* INNER JOINS or JOINS should be replaced with LEFT JOINS whenever possible.
* Some tables carry indexes on 2 or more columns. These kind of indexes cannot be used in JOINS. Therefore, indexes on a single column are the preferred solution.
* Logging all PostgreSQL queries is usefull. A short query should run in 1-3 millisecond (or even less). A deep query in less than 10 millisecond ortherwise, we should use Materielized VIEWS. I am running my production server with query logging. I will try to anonymise the logs and publish them.
* PhpBB 3.0 deep queries (such as paging) should be hard-wired using Materialized VIEWS in PL. In this case, main queries could go very deep to fetch all needed data in one run. This could be important for 5 to 10 queries and would make PhpBB the fastest solution for internet hosting services.
* This should reduce the need for data caching and allow to reduce PhpBB memory footprint.
Without joking, I think it is possible to multiply speed by a factor or 3 or more on large installations.