SQL optimisation

Discussion of general topics related to the new version and its place in the world. Don't discuss new features, report bugs, ask for support, et cetera. Don't use this to spam for other boards or attack those boards!
Forum rules
Discussion of general topics related to the new release and its place in the world. Don't discuss new features, report bugs, ask for support, et cetera. Don't use this to spam for other boards or attack those boards!
Post Reply
jmpoure
Registered User
Posts: 81
Joined: Sun Mar 11, 2007 9:42 am

SQL optimisation

Post by jmpoure »

Dear Friends,

I am using PhpBB 3.0 on a 400.000 messages bulletin board.
I turned SQL query debugging on in PostgreSQL.

Then I used a load testing tool to simulate 200 simultaneous queries.
This resulted in the board becoming very slow.

Then I looked a the log.
Some remarks :

Code: Select all

2008-01-03 23:47:39 CET LOG:  instruction : SELECT t.* FROM phpbb_topics t WHERE t.forum_id = 3
                        AND t.topic_type IN (0, 1)
                        AND t.topic_approved = 1
                         ORDER BY t.topic_type DESC, t.topic_last_post_time DESC
         LIMIT 50 OFFSET 0
2008-01-03 23:47:39 CET LOG:  durée : 96.478 ms
This is far too much.

The database will sort all data BEFORE doing a sequential scan.
You should select only the needed rows.

I will report other issues later on.

User avatar
DavidMJ
Registered User
Posts: 932
Joined: Thu Jun 16, 2005 1:14 am
Location: Great Neck, NY

Re: SQL optimisation

Post by DavidMJ »

It looks like you are correct, this query can be optimized. If you find others, don't hesitate to post them.
Freedom from fear

User avatar
DavidMJ
Registered User
Posts: 932
Joined: Thu Jun 16, 2005 1:14 am
Location: Great Neck, NY

Re: SQL optimisation

Post by DavidMJ »

Try the following:
viewforum.patch
(2.06 KiB) Downloaded 906 times
Freedom from fear

jmpoure
Registered User
Posts: 81
Joined: Sun Mar 11, 2007 9:42 am

Re: SQL optimisation

Post by jmpoure »

Thanks. I plan to continue debugging queries in postgresql debug mode.

jmpoure
Registered User
Posts: 81
Joined: Sun Mar 11, 2007 9:42 am

Re: SQL optimisation

Post by jmpoure »

After installing the patch:
2008-01-04 10:21:52 CET LOG: instruction : SELECT t.topic_id
FROM phpbb_topics t
WHERE t.forum_id = 26
AND t.topic_type IN (0, 1)
AND t.topic_approved = 1

ORDER BY t.topic_type DESC, t.topic_last_post_time DESC
LIMIT 50 OFFSET 0
2008-01-04 10:21:52 CET LOG: durée : 6.120 ms
phpbb3=# explain analyse SELECT t.topic_id
phpbb3-# FROM phpbb_topics t
phpbb3-# WHERE t.forum_id = 26
phpbb3-# AND t.topic_type IN (0, 1)
phpbb3-# AND t.topic_approved = 1
phpbb3-#
phpbb3-# ORDER BY t.topic_type DESC, t.topic_last_post_time DESC
phpbb3-# LIMIT 50 OFFSET 0;
QUERY PLAN
-------------------------------------------------------------------------------------------------------------------------------------------------------
Limit (cost=1205.66..1205.78 rows=50 width=10) (actual time=6.210..6.236 rows=50 loops=1)
-> Sort (cost=1205.66..1210.01 rows=1742 width=10) (actual time=6.208..6.217 rows=50 loops=1)
Sort Key: topic_type, topic_last_post_time
-> Bitmap Heap Scan on phpbb_topics t (cost=97.80..1111.88 rows=1742 width=10) (actual time=0.572..4.459 rows=1777 loops=1)
Recheck Cond: (forum_id = 26)
Filter: ((topic_type = ANY ('{0,1}'::integer[])) AND (topic_approved = 1))
-> Bitmap Index Scan on phpbb_topics_fid_time_moved (cost=0.00..97.37 rows=1747 width=0) (actual time=0.427..0.427 rows=2068 loops=1)
Index Cond: (forum_id = 26)
Total runtime: 6.310 ms
(9 lignes)
Great ...
I hope that this kind of patch is stored in SVN.

jmpoure
Registered User
Posts: 81
Joined: Sun Mar 11, 2007 9:42 am

Re: SQL optimisation

Post by jmpoure »

Another one, ISSUE #2
2008-01-04 10:38:48 CET LOG: instruction : SELECT u.*, z.friend, z.foe, p.* FROM phpbb_users u, phpbb_posts p LEFT JOIN phpbb_zebra z ON (z.user_id = 1 AND z.zebra_id = p.poster_id) WHERE p.post_id IN ('428795', '428276', '419494', '419492', '416989', '415533', '414555', '412933', '411866', '411610', '411444', '410843', '406278', '393871', '393636', '393598', '393589', '393584', '393146', '393078', '393055')
AND u.user_id = p.poster_id
2008-01-04 10:38:48 CET LOG: durée : 10.740 ms
phpbb_zebra has a combined index on user_id and zebra_id.
But this kind of index is useless in JOINS.
It can only be used in case of search when both values are present.

Let's create indexes to fasten joins:

Code: Select all

CREATE INDEX phpbb_zebra_user_id
  ON phpbb_zebra
  USING btree
  (user_id);

CREATE INDEX phpbb_zebra_zebra_id
  ON phpbb_zebra
  USING btree
  (zebra_id);
Result:
2008-01-04 11:04:44 CET LOG: instruction : SELECT u.*, z.friend, z.foe, p.* FROM phpbb_users u, phpbb_posts p LEFT JOIN phpbb_zebra z ON (z.user_id = 104617 AND z.zebra_id = p.poster_id) WHERE p.post_id IN ('433356', '433364', '433401', '433408', '433418', '433422', '433425', '433428', '433434', '433460', '433464')
AND u.user_id = p.poster_id
2008-01-04 11:04:44 CET LOG: durée : 2.448 ms
Searches are done on indexes:
phpbb3=# EXPLAIN ANALYSE SELECT u.*, z.friend, z.foe, p.* FROM phpbb_users u, phpbb_posts p LEFT JOIN phpbb_zebra z ON (z.user_id = 1 AND z.zebra_id = p.poster_id) WHERE p.post_id IN ('432652', '433287', '433346', '433380')
phpbb3-# AND u.user_id = p.poster_id;
QUERY PLAN
----------------------------------------------------------------------------------------------------------------------------------------------
Hash Left Join (cost=18.63..67.71 rows=4 width=1095) (actual time=0.172..0.225 rows=4 loops=1)
Hash Cond: (p.poster_id = z.zebra_id)
-> Nested Loop (cost=17.08..66.14 rows=4 width=1091) (actual time=0.097..0.148 rows=4 loops=1)
-> Bitmap Heap Scan on phpbb_posts p (cost=17.08..33.01 rows=4 width=651) (actual time=0.061..0.079 rows=4 loops=1)
Recheck Cond: (post_id = ANY ('{432652,433287,433346,433380}'::integer[]))
-> Bitmap Index Scan on phpbb_posts_pkey (cost=0.00..17.08 rows=4 width=0) (actual time=0.050..0.050 rows=4 loops=1)
Index Cond: (post_id = ANY ('{432652,433287,433346,433380}'::integer[]))
-> Index Scan using phpbb_users_pkey on phpbb_users u (cost=0.00..8.27 rows=1 width=440) (actual time=0.012..0.013 rows=1 loops=4)
Index Cond: (u.user_id = p.poster_id)
-> Hash (cost=1.54..1.54 rows=1 width=8) (actual time=0.012..0.012 rows=0 loops=1)
-> Seq Scan on phpbb_zebra z (cost=0.00..1.54 rows=1 width=8) (actual time=0.010..0.010 rows=0 loops=1)
Filter: (user_id = 1)
Total runtime: 0.447 ms
(13 lignes)

Also, you should avoid using '*' in queries.
It can drive one such query from 1.9 ms to 0.x milisecond.

jmpoure
Registered User
Posts: 81
Joined: Sun Mar 11, 2007 9:42 am

Re: SQL optimisation

Post by jmpoure »

ISSUE #3:
2008-01-04 11:29:04 CET LOG: instruction : SELECT u.*, z.friend, z.foe, p.* FROM phpbb_users u, phpbb_posts p LEFT JOIN phpbb_zebra z ON (z.user_id = 104595 AND z.zebra_id = p.poster_id) WHERE p.post_id IN ('320002', '320006', '320014', '320019', '320022', '320025', '320050', '320088', '320090', '320096', '320100', '320101', '320104', '320108', '320124', '320143', '320148', '320198', '320199', '320235', '320241', '320269', '320276', '320282', '320299', '320306', '320315', '320402', '320408', '320435', '320494', '320533', '320639', '320742', '320950', '321065', '321258', '321318', '321319', '321476', '321487', '321834')
AND u.user_id = p.poster_id
2008-01-04 11:29:04 CET LOG: durée : 29.609 ms
In PostgreSQL, each query should execute in less than 0.5 microsecond.
Let's analyse the query:
explain analyse SELECT u.*, z.friend, z.foe, p.* FROM phpbb_users u, phpbb_posts p LEFT JOIN phpbb_zebra z ON (z.user_id = 104595 AND z.zebra_id = p.poster_id) WHERE p.post_id IN ('320002', '320006', '320014', '320019', '320022', '320025', '320050', '320088', '320090', '320096', '320100', '320101', '320104', '320108', '320124', '320143', '320148', '320198', '320199', '320235', '320241', '320269', '320276', '320282', '320299', '320306', '320315', '320402', '320408', '320435', '320494', '320533', '320639', '320742', '320950', '321065', '321258', '321318', '321319', '321476', '321487', '321834')
phpbb3-# AND u.user_id = p.poster_id
phpbb3-# ;
QUERY PLAN
------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Hash Left Join (cost=180.47..634.74 rows=42 width=1095) (actual time=0.265..0.754 rows=42 loops=1)
Hash Cond: (p.poster_id = z.zebra_id)
-> Nested Loop (cost=178.92..633.02 rows=42 width=1091) (actual time=0.194..0.646 rows=42 loops=1)
-> Bitmap Heap Scan on phpbb_posts p (cost=178.92..345.21 rows=42 width=651) (actual time=0.167..0.277 rows=42 loops=1)
Recheck Cond: (post_id = ANY ('{320002,320006,320014,320019,320022,320025,320050,320088,320090,320096,320100,320101,320104,320108,320124,320143,320148,320198,320199,320235,320241,320269,320276,320282,320299,320306,320315,320402,320408,320435,320494,320533,320639,320742,320950,321065,321258,321318,321319,321476,321487,321834}'::integer[]))
-> Bitmap Index Scan on phpbb_posts_pkey (cost=0.00..178.91 rows=42 width=0) (actual time=0.154..0.154 rows=42 loops=1)
Index Cond: (post_id = ANY ('{320002,320006,320014,320019,320022,320025,320050,320088,320090,320096,320100,320101,320104,320108,320124,320143,320148,320198,320199,320235,320241,320269,320276,320282,320299,320306,320315,320402,320408,320435,320494,320533,320639,320742,320950,321065,321258,321318,321319,321476,321487,321834}'::integer[]))
-> Index Scan using phpbb_users_pkey on phpbb_users u (cost=0.00..6.84 rows=1 width=440) (actual time=0.005..0.006 rows=1 loops=42)
Index Cond: (u.user_id = p.poster_id)
-> Hash (cost=1.54..1.54 rows=1 width=8) (actual time=0.011..0.011 rows=0 loops=1)
-> Seq Scan on phpbb_zebra z (cost=0.00..1.54 rows=1 width=8) (actual time=0.010..0.010 rows=0 loops=1)
Filter: (user_id = 104595)
Total runtime: 0.976 ms
(13 lignes)
There is a sequential scan due to a mixed OLD join with modern LEFT JOIN.

The query should be (I am not sure, please check):
SELECT u.*, z.friend, z.foe, p.* FROM phpbb_posts p
INNER JOIN phpbb_users u ON u.user_id = p.poster_id
LEFT JOIN phpbb_zebra z ON (z.user_id = 104595 AND z.zebra_id = p.poster_id)
WHERE p.post_id IN ('320002', '320006', '320014', '320019', '320022', '320025', '320050', '320088', '320090', '320096', '320100', '320101', '320104', '320108', '320124', '320143', '320148', '320198', '320199', '320235', '320241', '320269', '320276', '320282', '320299', '320306', '320315', '320402', '320408', '320435', '320494', '320533', '320639', '320742', '320950', '321065', '321258', '321318', '321319', '321476', '321487', '321834')
Also, avoid '*' in queries. It eats up you CPU time in both PostgreSQL and PHP.

Let's analyse the new query:
phpbb3=# explain analyse
phpbb3-# SELECT u.*, z.friend, z.foe, p.* FROM phpbb_posts p
phpbb3-# INNER JOIN phpbb_users u ON u.user_id = p.poster_id
phpbb3-# LEFT JOIN phpbb_zebra z ON (z.user_id = 104595 AND z.zebra_id = p.poster_id)
phpbb3-# WHERE p.post_id IN ('320002', '320006', '320014', '320019', '320022', '320025', '320050', '320088', '320090', '320096', '320100', '320101', '320104', '320108', '320124', '320143', '320148', '320198', '320199', '320235', '320241', '320269', '320276', '320282', '320299', '320306', '320315', '320402', '320408', '320435', '320494', '320533', '320639', '320742', '320950', '321065', '321258', '321318', '321319', '321476', '321487', '321834');
QUERY PLAN
------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Nested Loop (cost=180.47..634.74 rows=42 width=1099) (actual time=0.277..0.788 rows=42 loops=1)
-> Hash Left Join (cost=180.47..346.93 rows=42 width=655) (actual time=0.251..0.441 rows=42 loops=1)
Hash Cond: (p.poster_id = z.zebra_id)
-> Bitmap Heap Scan on phpbb_posts p (cost=178.92..345.21 rows=42 width=651) (actual time=0.179..0.316 rows=42 loops=1)
Recheck Cond: (post_id = ANY ('{320002,320006,320014,320019,320022,320025,320050,320088,320090,320096,320100,320101,320104,320108,320124,320143,320148,320198,320199,320235,320241,320269,320276,320282,320299,320306,320315,320402,320408,320435,320494,320533,320639,320742,320950,321065,321258,321318,321319,321476,321487,321834}'::integer[]))
-> Bitmap Index Scan on phpbb_posts_pkey (cost=0.00..178.91 rows=42 width=0) (actual time=0.164..0.164 rows=42 loops=1)
Index Cond: (post_id = ANY ('{320002,320006,320014,320019,320022,320025,320050,320088,320090,320096,320100,320101,320104,320108,320124,320143,320148,320198,320199,320235,320241,320269,320276,320282,320299,320306,320315,320402,320408,320435,320494,320533,320639,320742,320950,321065,321258,321318,321319,321476,321487,321834}'::integer[]))
-> Hash (cost=1.54..1.54 rows=1 width=8) (actual time=0.014..0.014 rows=0 loops=1)
-> Seq Scan on phpbb_zebra z (cost=0.00..1.54 rows=1 width=8) (actual time=0.014..0.014 rows=0 loops=1)
Filter: (user_id = 104595)
-> Index Scan using phpbb_users_pkey on phpbb_users u (cost=0.00..6.84 rows=1 width=444) (actual time=0.005..0.006 rows=1 loops=42)
Index Cond: (u.user_id = p.poster_id)
Total runtime: 0.991 ms
(13 lignes)
As you can read, the query is executed in memory on indexes.

RESULT:
SELECT u.*, z.friend, z.foe, p.* FROM phpbb_posts p
INNER JOIN phpbb_users u ON u.user_id = p.poster_id
LEFT JOIN phpbb_zebra z ON (z.user_id = 104595 AND z.zebra_id = p.poster_id)
WHERE p.post_id IN ('320002', '320006', '320014', '320019', '320022', '320025', '320050', '320088', '320090', '320096', '320100', '320101', '320104', '320108', '320124', '320143', '320148', '320198', '320199', '320235', '320241', '320269', '320276', '320282', '320299', '320306', '320315', '320402', '320408', '320435', '320494', '320533', '320639', '320742', '320950', '321065', '321258', '321318', '321319', '321476', '321487', '321834')
2008-01-04 11:44:24 CET LOG: durée : 6.882 ms
Execution time divided by five.
Last edited by jmpoure on Fri Jan 04, 2008 10:55 am, edited 1 time in total.

jmpoure
Registered User
Posts: 81
Joined: Sun Mar 11, 2007 9:42 am

Re: SQL optimisation

Post by jmpoure »

Just for information, MySQL is crap. Developers wanting to investigate queries should run PostgreSQL. Just my 2 cents...

jmpoure
Registered User
Posts: 81
Joined: Sun Mar 11, 2007 9:42 am

Re: SQL optimisation

Post by jmpoure »

Here is a collection of SQL queries which run with *
008-01-04 14:44:43 CET LOG: instruction : SELECT u.*, s.*
FROM phpbb_sessions s, phpbb_users u
WHERE s.session_id = 'XXXXXXXXXXXXXXXXXXXXXXXX'
AND u.user_id = s.session_user_id
2008-01-04 14:44:43 CET LOG: durée : 2.099 ms

User avatar
Kellanved
Former Team Member
Posts: 407
Joined: Sun Jul 30, 2006 4:59 pm
Location: Berlin

Re: SQL optimisation

Post by Kellanved »

jmpoure wrote:

Also, you should avoid using '*' in queries.
It can drive one such query from 1.9 ms to 0.x milisecond.
Well, we can't do that everywhere, as we have to consider things like keeping the code MODable.
No support via PM.
Trust me, I'm a doctor.

Post Reply