SQL optimisation

Discussion of general topics related to the new version and its place in the world. Don't discuss new features, report bugs, ask for support, et cetera. Don't use this to spam for other boards or attack those boards!
Forum rules
Discussion of general topics related to the new release and its place in the world. Don't discuss new features, report bugs, ask for support, et cetera. Don't use this to spam for other boards or attack those boards!
Post Reply
User avatar
Acyd Burn
Posts: 1838
Joined: Tue Oct 08, 2002 5:18 pm
Location: Behind You
Contact:

Re: SQL optimisation

Post by Acyd Burn »

You need to understand that we support many dbs and therefore will not implement db-specific optimizations - this includes postgres specific views.

Image
jmpoure
Registered User
Posts: 81
Joined: Sun Mar 11, 2007 9:42 am

Re: SQL optimisation

Post by jmpoure »

Dear Acyd Burn,

This is your choice, but you should add indexing and replace INNER JOINS with LEFT JOINS and rewrite some queries.

Then why not create VIEWs on some queries.
* Most databases support VIEWS, including MySQL.
* DB2, Oracle and PostgreSQL (with PL) support materialised views.

The advantage of a VIEW is that it stores its own query plan.
We can add normal views first.
The materialized VIEW can be an addon.
User avatar
Highway of Life
Registered User
Posts: 1399
Joined: Tue Feb 08, 2005 10:18 pm
Location: I'd love to change the World, but they won't give me the Source Code
Contact:

Re: SQL optimisation

Post by Highway of Life »

jmpoure wrote:This is your choice, but you should add indexing and replace INNER JOINS with LEFT JOINS and rewrite some queries.
This is a blanket statement that is simply not true. Left Joins are not *always* better than inner joins. You would use the query that best suits the situation and information you are trying to pull. Sometimes this requires Inner joins, sometimes a Left Join is better.
Image
jmpoure
Registered User
Posts: 81
Joined: Sun Mar 11, 2007 9:42 am

Re: SQL optimisation

Post by jmpoure »

In most situations, including deep queries, LEFT JOINS are faster because they require less overhead and are hard-wired more quickly. When you run an INNER JOIN, the parser will ask itself LEFT, RIGHT, INNER ? Then it will process data and remove duplicates. You don't have this kind of overhead in LEFT JOINs. The parser will look at the main table and process a LEFT JOIN on indexes if possible. Small tables may reside in memory. And it will not remove duplicate as there are no duplicates. This is lightening fast.

If you don't trust this point of view, ask a question on PostgreSQL mailing list. In 90% situations when an INNER JOIN can be replaced by a LEFT JOIN, the LEFT JOIN is faster. When looking at my logs I can see a 20% time increase and I have no idea about memory and CPU gain, but I believe it is substantial.

I agree that when an INNER JOIN is really needed, it may be fast too. I don't indent to say that INNER JOINS are slow.
User avatar
Highway of Life
Registered User
Posts: 1399
Joined: Tue Feb 08, 2005 10:18 pm
Location: I'd love to change the World, but they won't give me the Source Code
Contact:

Re: SQL optimisation

Post by Highway of Life »

First, a left join will only return the same result as an inner join if no rows from the left table are eliminated when running the inner join.
Second, using inner join gives the optimiser more options to process the query, which typically leads to better performance than the left join equivalent.

The best method must be used depending on the data desired and the method of a query used. -- There are always variables and each query should be tested for optimisation. In some cases you will end up with different results depending on the method.

For the queries return identical results for left and inner joins, they can be compared.
Show me a large (deep) query that uses inner joins that should use left joins and I will benchmark the two for you.

Speed also depends on if there are indexes on the columns that are being joined -- Obviously, there should be indexes on columns that are joined -- which index is better, and on access method used. As I said before, the optimiser will have more options for executing inner join than for left join, so inner join has a chance of being executed faster.

Each one needs to be addresses on a query-by-query basis. You cannot issue a blanket x is better than y statement.
If you find a query that you believe could be optimised better, we will test it.
But remember that while phpBB3 supports over 8 DBMS types, PostgreSQL is not the only DBMS used, rules must be applied for cross-DBMS compatibility. One rule for a PostgreSQL query will not necessarily apply to a MySQL query.
Image
jmpoure
Registered User
Posts: 81
Joined: Sun Mar 11, 2007 9:42 am

Re: SQL optimisation

Post by jmpoure »

In my post, there are examples where using a LEFT JOIN gives 20% speedup on an INNER JOIN.
The advantage may be better given context change, memory use, CPU time, etc ..

You can look at this post which is the perfect example : viewtopic.php?f=3&t=29260&start=40#p191366
It was benchmarked on a production serveur with 3000 users and 400.000 posts.
HASH JOIN is 8.5 millisecond, LEFT HASH JOIN is 7 millisecond.
The cost is the same, but execution time of LEFT JOINS is always better in real time.

This SQL query could be rewritten using a VIEW, which would have again a 20% speedup effect on every database.

The next step after moving some important queries to VIEWs would be to install custom server-side code in PostgreSQL, we can run these views in 1 millisecond.
This would have no impact on PhpBB code, as this is server-side coding.
And this would only impact PostrgeSQL server-side coding, not other databases.

You assume that when issuing and INNER JOIN the parser may execute a LEFT JOIN !
And conclude that INNER JOINS give more choices. But choices are overhead in CPU, memory and more.

Why are LEFT JOINs faster than INNER JOINs?

When executing an INNER JOIN, the parser will build a 2D table and look for duplicates.
In a LEFTJOIN, the parser will use a 1D table and will not look for duplicates.
This makes a huge diffence when preparing and executing the query, context change, CPU overhead, memory usage, etc ..
You may not notice on ONE query, but on 100 queries a second, it matters.

I think we agree that a responsible developper shoud look at queries, and given internals, choose
LEFT JOINs in most case whenever possible, and more rarely let the computer choose for you.

Of course there are cases where an INNER JOIN is better, but then we should read the code
and write more explicit LEFT JOINs or RIGHT JOINs or whaterver is faster to execute.

Also, there is a question of how DEEP we should run SQL queries but I would like to discuss that in the private forum.
User avatar
Highway of Life
Registered User
Posts: 1399
Joined: Tue Feb 08, 2005 10:18 pm
Location: I'd love to change the World, but they won't give me the Source Code
Contact:

Re: SQL optimisation

Post by Highway of Life »

You bring up some excellent points, although my intention was not to debate it, but to rather point out that blanket statements are not always true and everything should be examined on a case-by-case basis, and certainly not the same rules do not apply to all DBMSes supported in phpBB3.
If you ask a developer for either MySQL or MSSQL, they will tell you that Inner Join is faster and less resource intensive than an equivalent Left Join with identical results. While this may not be the case for postgreSQL, I do not know for certain, as I have not heard from a postgreSQL developer on the difference.

I decided to take the query in question that you pointed out, and benchmark it on MySQL, since MySQL is the most common database used for phpBB3, pushing a Market Share over 95%.
I performed an identical Left Join for the query and ran it up against the inner join, loaded the page about 10 times on a large topic to determine the result.
In my initial benchmarks, the Inner Join had the benefit of 7% faster time than the left join, based on query benchmark alone.
The second benchmark

Code: Select all

CVS1 | Time : 0.0324s | 7 Queries | GZIP : Off | Memory Usage: 645.63 KB | Query : [Inner Join] 0.00031 | Page : viewtopic.php?f=2&t=1&_f_=2x
CVS1 | Time : 0.0296s | 7 Queries | GZIP : Off | Memory Usage: 645.71 KB | Query : [Inner Join] 0.00031 | Page : viewtopic.php?f=2&t=1&_f_=2x
CVS1 | Time : 0.0299s | 7 Queries | GZIP : Off | Memory Usage: 645.71 KB | Query : [Inner Join] 0.00030 | Page : viewtopic.php?f=2&t=1&_f_=2x
CVS1 | Time : 0.0297s | 7 Queries | GZIP : Off | Memory Usage: 645.71 KB | Query : [Inner Join] 0.00031 | Page : viewtopic.php?f=2&t=1&_f_=2x
CVS1 | Time : 0.0310s | 7 Queries | GZIP : Off | Memory Usage: 645.71 KB | Query : [Inner Join] 0.00032 | Page : viewtopic.php?f=2&t=1&_f_=2x
CVS1 | Time : 0.0330s | 7 Queries | GZIP : Off | Memory Usage: 645.71 KB | Query : [Inner Join] 0.00030 | Page : viewtopic.php?f=2&t=1&_f_=2x
CVS1 | Time : 0.0308s | 7 Queries | GZIP : Off | Memory Usage: 645.71 KB | Query : [Inner Join] 0.00033 | Page : viewtopic.php?f=2&t=1&_f_=2x
CVS1 | Time : 0.0327s | 7 Queries | GZIP : Off | Memory Usage: 645.71 KB | Query : [Inner Join] 0.00033 | Page : viewtopic.php?f=2&t=1&_f_=2x
CVS1 | Time : 0.0368s | 7 Queries | GZIP : Off | Memory Usage: 645.86 KB | Query : [Inner Join] 0.00052 | Page : viewtopic.php?f=2&t=1&_f_=2x
CVS1 | Time : 0.0375s | 7 Queries | GZIP : Off | Memory Usage: 1.11 MB | Query : [Left Join] 0.00041 | Page : viewtopic.php?f=2&t=1&_f_=2x
CVS1 | Time : 0.0327s | 7 Queries | GZIP : Off | Memory Usage: 1.11 MB | Query : [Left Join] 0.00031 | Page : viewtopic.php?f=2&t=1&_f_=2x
CVS1 | Time : 0.0336s | 7 Queries | GZIP : Off | Memory Usage: 1.11 MB | Query : [Left Join] 0.00030 | Page : viewtopic.php?f=2&t=1&_f_=2x
CVS1 | Time : 0.0321s | 7 Queries | GZIP : Off | Memory Usage: 1.11 MB | Query : [Left Join] 0.00031 | Page : viewtopic.php?f=2&t=1&_f_=2x
CVS1 | Time : 0.0324s | 7 Queries | GZIP : Off | Memory Usage: 1.11 MB | Query : [Left Join] 0.00031 | Page : viewtopic.php?f=2&t=1&_f_=2x
CVS1 | Time : 0.0333s | 8 Queries | GZIP : Off | Memory Usage: 1.11 MB | Query : [Left Join] 0.00031 | Page : viewtopic.php?f=2&t=1&_f_=2x
CVS1 | Time : 0.0319s | 7 Queries | GZIP : Off | Memory Usage: 1.11 MB | Query : [Left Join] 0.00031 | Page : viewtopic.php?f=2&t=1&_f_=2x
CVS1 | Time : 0.0328s | 7 Queries | GZIP : Off | Memory Usage: 1.11 MB | Query : [Left Join] 0.00035 | Page : viewtopic.php?f=2&t=1&_f_=2x
CVS1 | Time : 0.0340s | 7 Queries | GZIP : Off | Memory Usage: 1.11 MB | Query : [Left Join] 0.00031 | Page : viewtopic.php?f=2&t=1&_f_=2x
(EDIT: See two posts below, Queries would not by default affect PHP memory consumption if the result set is identical, disregard Memory Usage).
A Left join on the second set of benchmarks posted a 0.9% faster query time at 32.4 microseconds, compared to 34 microseconds for the Inner Join. -- Though the Left Join turned out to be consistently slower on the initial load page-load, which is not shown in the above results.

The conclusion? For this particular Query, an Inner Join is more efficient, just slightly with regards to query time.

Edit: The higher memory consumption seems highly irregular, I’m attempting to determine what is causing that much higher memory consumption.
I will also create a script to test a high-load server for more accurate Benchmark situations.
Last edited by Highway of Life on Thu Jan 10, 2008 7:19 am, edited 1 time in total.
Image
jmpoure
Registered User
Posts: 81
Joined: Sun Mar 11, 2007 9:42 am

Re: SQL optimisation

Post by jmpoure »

MySQL is not a standard database and has many inconsistencies.

Consider for example the existance of InnoDB tables and non-InnoDB tables.
You cannot really garantee transactions in MySQL!

In your exmaple, a 0,9% difference in speed does not matter.
It probably means that MySQL internal code for INNER JOINS and LEFT JOINs is the same.
The difference comes from Php code, probably.

In most databases, LEFT JOINs and INNER JOINS are wired differently.
Just consider that INNER JOINS need to sort and find duplicates and this makes a lot of difference.
LEFT JOINs do not have this kind of overhead.

I wrote a generic OPTIMIZATION HOWTO: http://forum.bulle-immobiliere.org/view ... 59&t=34657
I hope you will find it usefull.
Last edited by jmpoure on Fri Jan 11, 2008 9:14 am, edited 1 time in total.
User avatar
DavidMJ
Registered User
Posts: 932
Joined: Thu Jun 16, 2005 1:14 am
Location: Great Neck, NY

Re: SQL optimisation

Post by DavidMJ »

jmpoure wrote:In my post, there are examples where using a LEFT JOIN gives 20% speedup on an INNER JOIN.
The advantage may be better given context change, memory use, CPU time, etc ..

You can look at this post which is the perfect example : viewtopic.php?f=3&t=29260&start=40#p191366
It was benchmarked on a production serveur with 3000 users and 400.000 posts.
HASH JOIN is 8.5 millisecond, LEFT HASH JOIN is 7 millisecond.
The cost is the same, but execution time of LEFT JOINS is always better in real time.

This SQL query could be rewritten using a VIEW, which would have again a 20% speedup effect on every database.

The next step after moving some important queries to VIEWs would be to install custom server-side code in PostgreSQL, we can run these views in 1 millisecond.
This would have no impact on PhpBB code, as this is server-side coding.
And this would only impact PostrgeSQL server-side coding, not other databases.

You assume that when issuing and INNER JOIN the parser may execute a LEFT JOIN !
And conclude that INNER JOINS give more choices. But choices are overhead in CPU, memory and more.

Why are LEFT JOINs faster than INNER JOINs?

When executing an INNER JOIN, the parser will build a 2D table and look for duplicates.
In a LEFTJOIN, the parser will use a 1D table and will not look for duplicates.
This makes a huge diffence when preparing and executing the query, context change, CPU overhead, memory usage, etc ..
You may not notice on ONE query, but on 100 queries a second, it matters.

I think we agree that a responsible developper shoud look at queries, and given internals, choose
LEFT JOINs in most case whenever possible, and more rarely let the computer choose for you.

Of course there are cases where an INNER JOIN is better, but then we should read the code
and write more explicit LEFT JOINs or RIGHT JOINs or whaterver is faster to execute.

Also, there is a question of how DEEP we should run SQL queries but I would like to discuss that in the private forum.
What you say seems to be very true for PostgreSQL, it does seem to value the additional constraint by changing the way the join is executed on the table does seem to make it execute faster. DB2 does not seem to require this optimization, it generates a query plan in an INNER JOIN case as a LEFT JOIN case for a few queries that I tested. I will investigate which queries are safe to optimize regardless of DBMS :)
Freedom from fear
User avatar
Highway of Life
Registered User
Posts: 1399
Joined: Tue Feb 08, 2005 10:18 pm
Location: I'd love to change the World, but they won't give me the Source Code
Contact:

Re: SQL optimisation

Post by Highway of Life »

Update: memory consumption has nothing to do with the queries, X-Cache determined the functions.php file was changing constantly and stopped caching the file when I switched the query to use Left Join, so it printed a higher memory consumption results after it was reloaded.
To get a more accurate load result, I would also need to test on a db with a high number of sessions, which I intent to do.

EDIT: And with 5300+ results (users viewing a single topic at a single moment)
The Benchmarks:

Code: Select all

CVS1 | Time : 0.5889s | 9 Queries | GZIP : Off | Memory Usage: 1.53 MB | Query : [Left Join] 0.25011 | Page : viewtopic.php?f=2&t=1&_f_=2x
CVS1 | Time : 0.5855s | 7 Queries | GZIP : Off | Memory Usage: 1.52 MB | Query : [Left Join] 0.25254 | Page : viewtopic.php?f=2&t=1&_f_=2x
CVS1 | Time : 0.5607s | 7 Queries | GZIP : Off | Memory Usage: 1.52 MB | Query : [Left Join] 0.23234 | Page : viewtopic.php?f=2&t=1&_f_=2x
CVS1 | Time : 0.5865s | 7 Queries | GZIP : Off | Memory Usage: 1.52 MB | Query : [Left Join] 0.25617 | Page : viewtopic.php?f=2&t=1&_f_=2x
CVS1 | Time : 0.5592s | 7 Queries | GZIP : Off | Memory Usage: 1.52 MB | Query : [Left Join] 0.23478 | Page : viewtopic.php?f=2&t=1&_f_=2x
CVS1 | Time : 0.5678s | 7 Queries | GZIP : Off | Memory Usage: 1.52 MB | Query : [Left Join] 0.23654 | Page : viewtopic.php?f=2&t=1&_f_=2x
CVS1 | Time : 0.5774s | 7 Queries | GZIP : Off | Memory Usage: 1.52 MB | Query : [Left Join] 0.25119 | Page : viewtopic.php?f=2&t=1&_f_=2x
CVS1 | Time : 0.5690s | 7 Queries | GZIP : Off | Memory Usage: 1.52 MB | Query : [Left Join] 0.24082 | Page : viewtopic.php?f=2&t=1&_f_=2x
CVS1 | Time : 0.6259s | 7 Queries | GZIP : Off | Memory Usage: 2.21 MB | Query : [Inner Join] 0.26811 | Page : viewtopic.php?f=2&t=1&_f_=2x
CVS1 | Time : 0.8217s | 7 Queries | GZIP : Off | Memory Usage: 1.52 MB | Query : [Inner Join] 0.43092 | Page : viewtopic.php?f=2&t=1&_f_=2x
CVS1 | Time : 0.5997s | 7 Queries | GZIP : Off | Memory Usage: 1.52 MB | Query : [Inner Join] 0.24785 | Page : viewtopic.php?f=2&t=1&_f_=2x
CVS1 | Time : 0.5700s | 7 Queries | GZIP : Off | Memory Usage: 1.52 MB | Query : [Inner Join] 0.23712 | Page : viewtopic.php?f=2&t=1&_f_=2x
CVS1 | Time : 0.5675s | 7 Queries | GZIP : Off | Memory Usage: 1.52 MB | Query : [Inner Join] 0.24160 | Page : viewtopic.php?f=2&t=1&_f_=2x
CVS1 | Time : 0.5912s | 7 Queries | GZIP : Off | Memory Usage: 1.52 MB | Query : [Inner Join] 0.26679 | Page : viewtopic.php?f=2&t=1&_f_=2x
CVS1 | Time : 1.3926s | 8 Queries | GZIP : Off | Memory Usage: 1.53 MB | Query : [Inner Join] 1.05743 | Page : viewtopic.php?f=2&t=1&_f_=2x
CVS1 | Time : 0.5826s | 7 Queries | GZIP : Off | Memory Usage: 1.52 MB | Query : [Inner Join] 0.25367 | Page : viewtopic.php?f=2&t=1&_f_=2x
CVS1 | Time : 0.5635s | 7 Queries | GZIP : Off | Memory Usage: 1.52 MB | Query : [Inner Join] 0.23836 | Page : viewtopic.php?f=2&t=1&_f_=2x
It appears that the Left Join can be as much as a dozen milliseconds faster.
Image
Post Reply