SQL optimisation

Discussion of general topics related to the new version and its place in the world. Don't discuss new features, report bugs, ask for support, et cetera. Don't use this to spam for other boards or attack those boards!
Forum rules
Discussion of general topics related to the new release and its place in the world. Don't discuss new features, report bugs, ask for support, et cetera. Don't use this to spam for other boards or attack those boards!
Post Reply
User avatar
DavidMJ
Registered User
Posts: 932
Joined: Thu Jun 16, 2005 1:14 am
Location: Great Neck, NY

Re: SQL optimisation

Post by DavidMJ »

jmpoure wrote:I am in charge of handling the various databases that phpBB is supported on.
I always do my query work on PostgreSQL as it is the most strict :)
Freedom from fear

jmpoure
Registered User
Posts: 81
Joined: Sun Mar 11, 2007 9:42 am

Re: SQL optimisation

Post by jmpoure »

Look at that. We discussed about this Zebra thing before:

Code: Select all

SELECT u.*, z.friend, z.foe, p.* FROM phpbb_users u, phpbb_posts p LEFT JOIN phpbb_zebra z ON (z.user_id = 1 AND z.zebra_id = p.poster_id) WHERE p.post_id IN ('421984', '421983', '421982', '421981', '421980', '421979', '421978', '421977', '421976', '421975', '421974', '421973', '421971', '421970', '421969', '421968', '421967', '421966', '421965', '421964', '421963', '421962', '421961', '421960', '421958', '421957', '421956', '421955', '421954', '421953', '421952', '421951', '421950', '421949', '421948', '421947', '421946', '421944', '421943', '421942', '421941', '421939', '421938', '421937', '421936', '421935', '421934', '421932', '421931', '421930')
                        AND u.user_id = p.poster_id
2008-01-07 23:25:27 CET LOG:  durée : 138.546 ms
Query plan:

Code: Select all

                                                  QUERY PLAN                                                                                                                                                                                                           
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 Hash Left Join  (cost=210.60..742.86 rows=50 width=1050) (actual time=0.265..0.990 rows=50 loops=1)
   Hash Cond: (p.poster_id = z.zebra_id)
   ->  Nested Loop  (cost=209.00..741.06 rows=50 width=1046) (actual time=0.202..0.883 rows=50 loops=1)
         ->  Bitmap Heap Scan on phpbb_posts p  (cost=209.00..406.99 rows=50 width=642) (actual time=0.169..0.386 rows=50 loops=1)
               Recheck Cond: (post_id = ANY ('{421984,421983,421982,421981,421980,421979,421978,421977,421976,421975,421974,421973,421971,421970,421969,421968,421967,421966,421965,421964,421963,421962,421961,421960,421958,421957,421956,421955,421954,421953,421952,421951,421950,421949,421948,421947,421946,421944,421943,421942,421941,421939,421938,421937,421936,421935,421934,421932,421931,421930}'::integer[]))
               ->  Bitmap Index Scan on phpbb_posts_pkey  (cost=0.00..208.98 rows=50 width=0) (actual time=0.153..0.153 rows=50 loops=1)
                     Index Cond: (post_id = ANY ('{421984,421983,421982,421981,421980,421979,421978,421977,421976,421975,421974,421973,421971,421970,421969,421968,421967,421966,421965,421964,421963,421962,421961,421960,421958,421957,421956,421955,421954,421953,421952,421951,421950,421949,421948,421947,421946,421944,421943,421942,421941,421939,421938,421937,421936,421935,421934,421932,421931,421930}'::integer[]))
         ->  Index Scan using phpbb_users_pkey on phpbb_users u  (cost=0.00..6.67 rows=1 width=404) (actual time=0.007..0.007 rows=1 loops=50)
               Index Cond: (u.user_id = p.poster_id)
   ->  Hash  (cost=1.59..1.59 rows=1 width=8) (actual time=0.014..0.014 rows=0 loops=1)
         ->  Seq Scan on phpbb_zebra z  (cost=0.00..1.59 rows=1 width=8) (actual time=0.013..0.013 rows=0 loops=1)
               Filter: (user_id = 1)
 Total runtime: 1.216 ms
(13 lignes)
The correct SQL query would be:

Code: Select all

SELECT u.*, z.friend, z.foe, p.* 
FROM phpbb_posts p 
LEFT JOIN phpbb_zebra z ON (z.user_id = 1 AND z.zebra_id = p.poster_id) 
LEFT JOIN phpbb_users u ON u.user_id = p.poster_id
WHERE p.post_id IN ('421984', '421983', '421982', '421981', '421980', '421979', '421978', '421977', '421976', '421975', '421974', '421973', '421971', '421970', '421969', '421968', '421967', '421966', '421965', '421964', '421963', '421962', '421961', '421960', '421958', '421957', '421956', '421955', '421954', '421953', '421952', '421951', '421950', '421949', '421948', '421947', '421946', '421944', '421943', '421942', '421941', '421939', '421938', '421937', '421936', '421935', '421934', '421932', '421931', '421930')
Now run EXPLAIN ANALYSE:

Code: Select all

                                                  QUERY PLAN                                                                                                                                                                                                           
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 Nested Loop Left Join  (cost=210.60..742.86 rows=50 width=1050) (actual time=0.156..0.537 rows=50 loops=1)
   ->  Hash Left Join  (cost=210.60..408.79 rows=50 width=646) (actual time=0.146..0.251 rows=50 loops=1)
         Hash Cond: (p.poster_id = z.zebra_id)
         ->  Bitmap Heap Scan on phpbb_posts p  (cost=209.00..406.99 rows=50 width=642) (actual time=0.126..0.183 rows=50 loops=1)
               Recheck Cond: (post_id = ANY ('{421984,421983,421982,421981,421980,421979,421978,421977,421976,421975,421974,421973,421971,421970,421969,421968,421967,421966,421965,421964,421963,421962,421961,421960,421958,421957,421956,421955,421954,421953,421952,421951,421950,421949,421948,421947,421946,421944,421943,421942,421941,421939,421938,421937,421936,421935,421934,421932,421931,421930}'::integer[]))
               ->  Bitmap Index Scan on phpbb_posts_pkey  (cost=0.00..208.98 rows=50 width=0) (actual time=0.119..0.119 rows=50 loops=1)
                     Index Cond: (post_id = ANY ('{421984,421983,421982,421981,421980,421979,421978,421977,421976,421975,421974,421973,421971,421970,421969,421968,421967,421966,421965,421964,421963,421962,421961,421960,421958,421957,421956,421955,421954,421953,421952,421951,421950,421949,421948,421947,421946,421944,421943,421942,421941,421939,421938,421937,421936,421935,421934,421932,421931,421930}'::integer[]))
         ->  Hash  (cost=1.59..1.59 rows=1 width=8) (actual time=0.011..0.011 rows=0 loops=1)
               ->  Seq Scan on phpbb_zebra z  (cost=0.00..1.59 rows=1 width=8) (actual time=0.011..0.011 rows=0 loops=1)
                     Filter: (user_id = 1)
   ->  Index Scan using phpbb_users_pkey on phpbb_users u  (cost=0.00..6.67 rows=1 width=404) (actual time=0.003..0.003 rows=1 loops=50)
         Index Cond: (u.user_id = p.poster_id)
 Total runtime: 0.642 ms
(13 lignes)

At least 100 millisecond faster.
LEFT JOINS are always faster than inner joins.

jmpoure
Registered User
Posts: 81
Joined: Sun Mar 11, 2007 9:42 am

Re: SQL optimisation

Post by jmpoure »

DavidMJ wrote:
jmpoure wrote:I am in charge of handling the various databases that phpBB is supported on.
I always do my query work on PostgreSQL as it is the most strict :)
Cool.

jmpoure
Registered User
Posts: 81
Joined: Sun Mar 11, 2007 9:42 am

Re: SQL optimisation

Post by jmpoure »

Feel free to choose your prefered optimisations, and drop the others.

When you are finished with modifications, I am interested in creating a set of 5 to 10 materialized view in PL.
We could be using http://www.jonathangardner.net/PostgreS ... views.html

Together with tseach2, it could drive all queries, including LEFT JOINS below 2 or 3 milliseconds.
Then a simple bi-core computer could host 1000 users ... if SQL caching does not slow the server.

User avatar
DavidMJ
Registered User
Posts: 932
Joined: Thu Jun 16, 2005 1:14 am
Location: Great Neck, NY

Re: SQL optimisation

Post by DavidMJ »

jmpoure wrote:That one is quite slow:

Code: Select all

2008-01-04 19:31:15 CET LOG:  instruction : SELECT g.group_id, g.group_name, g.group_colour, g.group_type
			FROM phpbb_groups g
			LEFT JOIN phpbb_user_group ug
				ON (
					g.group_id = ug.group_id
					AND ug.user_id = 1
					AND ug.user_pending = 0
				)
			WHERE g.group_legend = 1
				AND (g.group_type <> 2 OR ug.user_id = 1)
			ORDER BY g.group_name ASC
2008-01-04 19:31:15 CET LOG:  durée : 7.861 ms
Indexes are missing:

Code: Select all

CREATE INDEX phpbb_user_group_user_pending in a LEFT JOIN

  ON phpbb_user_group
  USING btree
  (user_pending);

CREATE INDEX phpbb_groups_group_type
  ON phpbb_groups
  USING btree
  (group_type);

CREATE INDEX phpbb_groups_group_name
  ON phpbb_groups
  USING btree
  (group_name);
There should be an increase.

Let's display the query plan:

Code: Select all

EXPLAIN ANALYSE

SELECT g.group_id, g.group_name, g.group_colour, g.group_type
			FROM phpbb_groups g
			LEFT JOIN phpbb_user_group ug
				ON (
					g.group_id = ug.group_id
					AND ug.user_id = 1
					AND ug.user_pending = 0
				)
			WHERE g.group_legend = 1
				AND (g.group_type <> 2 OR ug.user_id = 1)
			ORDER BY g.group_name ASC;

Code: Select all

                                                                           QUERY PLAN                                                                           
----------------------------------------------------------------------------------------------------------------------------------------------------------------
 Sort  (cost=9.43..9.43 rows=2 width=28) (actual time=0.105..0.106 rows=2 loops=1)
   Sort Key: g.group_name
   ->  Hash Left Join  (cost=8.28..9.42 rows=2 width=28) (actual time=0.048..0.051 rows=2 loops=1)
         Hash Cond: (g.group_id = ug.group_id)
         Filter: ((g.group_type <> 2) OR (ug.user_id = 1))
         ->  Seq Scan on phpbb_groups g  (cost=0.00..1.11 rows=2 width=28) (actual time=0.009..0.011 rows=2 loops=1)
               Filter: (group_legend = 1)
         ->  Hash  (cost=8.27..8.27 rows=1 width=8) (actual time=0.018..0.018 rows=1 loops=1)
               ->  Index Scan using phpbb_user_group_user_id on phpbb_user_group ug  (cost=0.00..8.27 rows=1 width=8) (actual time=0.015..0.015 rows=1 loops=1)
                     Index Cond: (user_id = 1)
                     Filter: (user_pending = 0)
 Total runtime: 0.142 ms
(12 lignes)
Should be faster, thanks to user_pending index.
According to that query plan, it is using none of your new indexes.
Freedom from fear

User avatar
DavidMJ
Registered User
Posts: 932
Joined: Thu Jun 16, 2005 1:14 am
Location: Great Neck, NY

Re: SQL optimisation

Post by DavidMJ »

jmpoure wrote:This is too slow:

Code: Select all

2008-01-04 21:53:58 CET LOG:  instruction : SELECT u.username, u.username_clean, u.user_id, u.user_type, u.user_allow_viewonline, u.user_colour, s.session_ip, s.session_viewonline
				FROM phpbb_users u, phpbb_sessions s
				WHERE s.session_time >= 1199479738
					AND u.user_id = s.session_user_id
				ORDER BY u.username_clean ASC, s.session_ip ASC
2008-01-04 21:53:58 CET LOG:  durée : 8.137 ms
Quey plan :

Code: Select all

                                                            QUERY PLAN                                                            
----------------------------------------------------------------------------------------------------------------------------------
 Sort  (cost=712.17..713.04 rows=347 width=52) (actual time=16.027..16.133 rows=551 loops=1)
   Sort Key: u.username_clean, s.session_ip
   ->  Hash Join  (cost=548.12..697.53 rows=347 width=52) (actual time=8.597..10.044 rows=551 loops=1)
         Hash Cond: (s.session_user_id = u.user_id)
         ->  Seq Scan on phpbb_sessions s  (cost=0.00..143.34 rows=347 width=22) (actual time=0.008..1.006 rows=551 loops=1)
               Filter: (session_time >= 1199479738)
         ->  Hash  (cost=493.61..493.61 rows=4361 width=34) (actual time=8.578..8.578 rows=4363 loops=1)
               ->  Seq Scan on phpbb_users u  (cost=0.00..493.61 rows=4361 width=34) (actual time=0.007..6.355 rows=4363 loops=1)
 Total runtime: 16.303 ms
(9 lignes)
In fact, this query is a LEFT JOIN.
A left join is always fast than an inner join.

It should be written:

Code: Select all

SELECT u.username, u.username_clean, u.user_id, u.user_type, u.user_allow_viewonline, u.user_colour, s.session_ip, s.session_viewonline
				FROM phpbb_sessions s
LEFT JOIN phpbb_users u ON u.user_id = s.session_user_id
				WHERE s.session_time >= 1199479738
				ORDER BY u.username_clean ASC, s.session_ip ASC
Query plan :

Code: Select all

                                                            QUERY PLAN                                                            
----------------------------------------------------------------------------------------------------------------------------------
 Sort  (cost=712.17..713.04 rows=347 width=52) (actual time=14.096..14.202 rows=558 loops=1)
   Sort Key: u.username_clean, s.session_ip
   ->  Hash Left Join  (cost=548.12..697.53 rows=347 width=52) (actual time=7.102..8.086 rows=558 loops=1)
         Hash Cond: (s.session_user_id = u.user_id)
         ->  Seq Scan on phpbb_sessions s  (cost=0.00..143.34 rows=347 width=22) (actual time=0.006..0.544 rows=558 loops=1)
               Filter: (session_time >= 1199479738)
         ->  Hash  (cost=493.61..493.61 rows=4361 width=34) (actual time=7.084..7.084 rows=4363 loops=1)
               ->  Seq Scan on phpbb_users u  (cost=0.00..493.61 rows=4361 width=34) (actual time=0.005..4.863 rows=4363 loops=1)
 Total runtime: 14.336 ms
(9 lignes)
Gain: 20% time at least ...
The differences in those query plans come down to differences in the sequential scan, which largely has nothing to do with the query itself...
Freedom from fear

jmpoure
Registered User
Posts: 81
Joined: Sun Mar 11, 2007 9:42 am

Re: SQL optimisation

Post by jmpoure »

DavidMJ wrote:The differences in those query plans come down to differences in the sequential scan, which largely has nothing to do with the query itself...
I do not agree. During an INNER JOIN, the parser will consider LEFT and RIGHT possibilities. On the converse, a LEFT JOIN is always faster, because the query planner and optimiser knows the main table (clause FROM). LEFT JOINS are the fastest joins. Whenever you can replace an INNER JOIN with a LEFT JOIN, you will gain a minimum of 20%. You can clearly see this in the result plan. The LEFT JOIN is 20% faster: 14 microseconds instead of 16 microseconds. But the gab would be larger on deeper queries or larger tables.

For speed, the next step would be to use MARERIALIZED VIEWS. In this case,

Code: Select all

SELECT u.username, u.username_clean, u.user_id, u.user_type, u.user_allow_viewonline, u.user_colour, s.session_ip, s.session_viewonline
            FROM phpbb_sessions s
LEFT JOIN phpbb_users u ON u.user_id = s.session_user_id
            WHERE s.session_time >= 1199479738
            ORDER BY u.username_clean ASC, s.session_ip ASC
where the view would strore the content of u.username, u.username_clean, u.user_id, u.user_type, u.user_allow_viewonline, u.user_colour in the session table (and manage updates). Using PL, this would be transparent on PostgreSQL. This query would run transparently in 1 millisecond or less. In my database, there is ONE update for 100 SELECTs. Using Materialized views could boost speed by 2 or 3 and reduce the corresponding memory footprint of PHP, because less caching would be required.
Last edited by jmpoure on Tue Jan 08, 2008 7:56 am, edited 4 times in total.

User avatar
DavidMJ
Registered User
Posts: 932
Joined: Thu Jun 16, 2005 1:14 am
Location: Great Neck, NY

Re: SQL optimisation

Post by DavidMJ »

In this case, it looks like the query planner chose the same join order on the tables...
Freedom from fear

jmpoure
Registered User
Posts: 81
Joined: Sun Mar 11, 2007 9:42 am

Re: SQL optimisation

Post by jmpoure »

DavidMJ wrote:In this case, it looks like the query planner chose the same join order on the tables...
There is a real difference: the first query does a simple JOIN (Hash Join (cost=548.12..697.53 rows=347 width=52) (actual time=8.597..10.044 rows=551 loops=1)), the second a LEFT JOIN (Hash Left Join (cost=548.12..697.53 rows=347 width=52) (actual time=7.102..8.086 rows=558 loops=1)).

When possible, every simple JOIN should be replaced with the equivalent LEFT JOIN. 20% or more speed/CPU usage/memory footprint matters on my server which handles from 20 to 50 queries every second. In the case of large installations or hosting services, this really matters.

Again, real optimisation would be reducing this query to 1 millisecond or less using materialized views.

jmpoure
Registered User
Posts: 81
Joined: Sun Mar 11, 2007 9:42 am

Re: SQL optimisation

Post by jmpoure »

To summarise my opinion:
* tsearch2 should come as a standard choice for users. When tsearch2 is present, use it.
* INNER JOINS or JOINS should be replaced with LEFT JOINS whenever possible.
* Some tables carry indexes on 2 or more columns. These kind of indexes cannot be used in JOINS. Therefore, indexes on a single column are the preferred solution.
* Logging all PostgreSQL queries is usefull. A short query should run in 1-3 millisecond (or even less). A deep query in less than 10 millisecond ortherwise, we should use Materielized VIEWS. I am running my production server with query logging. I will try to anonymise the logs and publish them.
* PhpBB 3.0 deep queries (such as paging) should be hard-wired using Materialized VIEWS in PL. In this case, main queries could go very deep to fetch all needed data in one run. This could be important for 5 to 10 queries and would make PhpBB the fastest solution for internet hosting services.
* This should reduce the need for data caching and allow to reduce PhpBB memory footprint.

Without joking, I think it is possible to multiply speed by a factor or 3 or more on large installations.

Post Reply