I am glad I finaly found the main forum for PhpBB 3.x development.
I am quite impressed by PhpBB 3.x and I would like to thank you all for your time and efforts.
Someone wrote me about a Full-text plug-in for MySQL.
I would like to write the same for PostgreSQL full-text engine.
In my opinion, full-text engine should not be a MOD in PhpBB 3.x.
It should be a standard. Why ? Because PhpBB 2.x / 3.x are likely to break in certain conditions.
When using PhpBB 2.x in production with 200.000 messages and more than simultaneous 200 users,
I experienced severe slow-down of my bi-xeon dedicated server.
After running audit tools, I am now sure that these severe slow-downs are not due to a lack of database caching ... but design problems in Php 2.x. After reading part of Php 3.x code, I am not sure that the new code will solve the problems of power users OR hosting companies, that first and almost need a forum solution that does not eat too much CPU or memory. A solution that does not break-down under heavy load.
In short, I believe that PhpBB 2.x can run 100 simultaneous users, but is easily stuck afterwards. The problem is that modern needs sometimes includes 1000 simultaneous users.
Okay, now let us study a tipical break-down, when more that 200 simultaneous users are connected on a PhpBB 2.x board:
Here is what can happen:
* Several users write messages, including sometimes long text.
* PhpBB PHP code eats-up CPU-usage during text parsing. PHP is an interpreted language. It is pretty slow during text parsing.
* Furthermore, PHP code runs multiple SQL queries to verify the uniqueness of words entered. When a message has 1.000 words, PhpBB 2.x will run 1.000 SQL queries. Of course, these queries will run on a single pConnect, but it is a loss of time. When a server is busy, you can sometimes wait 1 minute when posting a message.
* Because the server is stuck running PHP parsing code AND running multiple SQL queries, Apache and PostgreSQL database fork new processes to answer http queries.
* This creates new memory needs, both for Apache/PHP and PostgreSQL. Which in turn slow down PhpBB 2.x dramatically.
* After a few hours of heavy usage, you end-up with a very slow server, even when it is a dedicated Bi-Xeon, with 1Gb memory and SATA drives.
The solution I found was to implement TSeach2 full-text engine.
The advantages are as follows:
* No need to parse text using PHP. PostgreSQL TSearch2 code parses the text server-side. This results in only 1 (one) SQL query, instead of thousands.
* Immediate response time during full-text search. Usually 1 millisecond. I will post example logs shortly.
* Limited footprint of indexes, using GIN technology.
Example:
explain analyse verbose SELECT * FROM phpbb_posts_text WHERE idxfti @@ 'jmp'::tsquery limit 100;
Limit (cost=0.00..444.47 rows=100 width=934) (actual time=0.046..0.824 rows=100 loops=1)
-> Index Scan using idxfti_idx on phpbb_posts_text (cost=0.00..1053.38 rows=237 width=934) (actual time=0.040..0.418 rows=100 loops=1)
Index Cond: (idxfti @@ '''jmp'''::tsquery)
Total runtime: 1.068 ms
(319 lignes)
Looking for the word "JMP" into 200.000 messages only takes 1 millisecond.
TSearch2 Full-text engine for PostgreSQL is THE answer to PhpBB 3.x needs. It was chosen by Wikimedia beta release to answer the needs of power-users.
Before I start development, I would like to hear your reaction.
My plans are:
PhpBB 2.x
I would like to port PhpBB 2.x to TSearch2 PostgreSQL engine. This could solve downtime problems of PhpBB 2.x power users. The home page of TSeach2 : http://www.sai.msu.su/~megera/postgres/gist/tsearch/V2/
PhpBB 3.x
Create a MOD for PostgreSQL 8.3 cvs version. PostgreSQL 8.3 cvs version includes a full-text search engine bases on TSearch2. I my opinion, it would be better to leverage directly on that new solution. The homepage of FTS for PostgreSQL 8.3+ : http://mira.sai.msu.su/~megera/pgsql/ftsdoc
Kind regards,
Jean-Michel