Full text search engine for PhpBB 3.x / PostgreSQL

Discuss features as they are added to the new version. Give us your feedback. Don't post bug reports, feature requests, support questions or suggestions here.
Forum rules
Discuss features as they are added to the new version. Give us your feedback. Don't post bug reports, feature requests, support questions or suggestions here. Feature requests are closed.
Post Reply
jmpoure
Registered User
Posts: 81
Joined: Sun Mar 11, 2007 9:42 am

Full text search engine for PhpBB 3.x / PostgreSQL

Post by jmpoure »

Dear friends,

I am glad I finaly found the main forum for PhpBB 3.x development.
I am quite impressed by PhpBB 3.x and I would like to thank you all for your time and efforts.

Someone wrote me about a Full-text plug-in for MySQL.
I would like to write the same for PostgreSQL full-text engine.

In my opinion, full-text engine should not be a MOD in PhpBB 3.x.
It should be a standard. Why ? Because PhpBB 2.x / 3.x are likely to break in certain conditions.

When using PhpBB 2.x in production with 200.000 messages and more than simultaneous 200 users,
I experienced severe slow-down of my bi-xeon dedicated server.

After running audit tools, I am now sure that these severe slow-downs are not due to a lack of database caching ... but design problems in Php 2.x. After reading part of Php 3.x code, I am not sure that the new code will solve the problems of power users OR hosting companies, that first and almost need a forum solution that does not eat too much CPU or memory. A solution that does not break-down under heavy load.

In short, I believe that PhpBB 2.x can run 100 simultaneous users, but is easily stuck afterwards. The problem is that modern needs sometimes includes 1000 simultaneous users.

Okay, now let us study a tipical break-down, when more that 200 simultaneous users are connected on a PhpBB 2.x board:

Here is what can happen:
* Several users write messages, including sometimes long text.
* PhpBB PHP code eats-up CPU-usage during text parsing. PHP is an interpreted language. It is pretty slow during text parsing.
* Furthermore, PHP code runs multiple SQL queries to verify the uniqueness of words entered. When a message has 1.000 words, PhpBB 2.x will run 1.000 SQL queries. Of course, these queries will run on a single pConnect, but it is a loss of time. When a server is busy, you can sometimes wait 1 minute when posting a message.
* Because the server is stuck running PHP parsing code AND running multiple SQL queries, Apache and PostgreSQL database fork new processes to answer http queries.
* This creates new memory needs, both for Apache/PHP and PostgreSQL. Which in turn slow down PhpBB 2.x dramatically.
* After a few hours of heavy usage, you end-up with a very slow server, even when it is a dedicated Bi-Xeon, with 1Gb memory and SATA drives.

The solution I found was to implement TSeach2 full-text engine.
The advantages are as follows:
* No need to parse text using PHP. PostgreSQL TSearch2 code parses the text server-side. This results in only 1 (one) SQL query, instead of thousands.
* Immediate response time during full-text search. Usually 1 millisecond. I will post example logs shortly.
* Limited footprint of indexes, using GIN technology.

Example:
explain analyse verbose SELECT * FROM phpbb_posts_text WHERE idxfti @@ 'jmp'::tsquery limit 100;

Limit (cost=0.00..444.47 rows=100 width=934) (actual time=0.046..0.824 rows=100 loops=1)
-> Index Scan using idxfti_idx on phpbb_posts_text (cost=0.00..1053.38 rows=237 width=934) (actual time=0.040..0.418 rows=100 loops=1)
Index Cond: (idxfti @@ '''jmp'''::tsquery)
Total runtime: 1.068 ms
(319 lignes)


Looking for the word "JMP" into 200.000 messages only takes 1 millisecond.

TSearch2 Full-text engine for PostgreSQL is THE answer to PhpBB 3.x needs. It was chosen by Wikimedia beta release to answer the needs of power-users.

Before I start development, I would like to hear your reaction.

My plans are:

:arrow: PhpBB 2.x
I would like to port PhpBB 2.x to TSearch2 PostgreSQL engine. This could solve downtime problems of PhpBB 2.x power users. The home page of TSeach2 : http://www.sai.msu.su/~megera/postgres/gist/tsearch/V2/

:arrow: PhpBB 3.x
Create a MOD for PostgreSQL 8.3 cvs version. PostgreSQL 8.3 cvs version includes a full-text search engine bases on TSearch2. I my opinion, it would be better to leverage directly on that new solution. The homepage of FTS for PostgreSQL 8.3+ : http://mira.sai.msu.su/~megera/pgsql/ftsdoc

Kind regards,
Jean-Michel
Last edited by jmpoure on Sun Mar 11, 2007 11:30 am, edited 1 time in total.

User avatar
karlsemple
Registered User
Posts: 480
Joined: Mon Jan 23, 2006 8:49 am
Location: Hereford
Contact:

Re: Full text search engine for PhpBB 3.x / PostgreSQL

Post by karlsemple »

As Acyd Burn told you on phpbb.com, you are free to write this as a mod for the software :)

jmpoure
Registered User
Posts: 81
Joined: Sun Mar 11, 2007 9:42 am

Re: Full text search engine for PhpBB 3.x / PostgreSQL

Post by jmpoure »

karlsemple wrote: As Acyd Burn told you on phpbb.com, you are free to write this as a mod for the software :)


Thanks. I will.
But in my opinion, such a MOD is not a plug-in, it is an absolute requirement.
Especially when hosting PhpBB on shared server.

Where can I find MySQL full-text mod?
I may modify this MOD, contact the author, to add PostgreSQL support.
Better and more simple for inclusion into PhpBB 3.x

Paul
Infrastructure Team Leader
Infrastructure Team Leader
Posts: 373
Joined: Thu Sep 16, 2004 9:02 am
Contact:

Re: Full text search engine for PhpBB 3.x / PostgreSQL

Post by Paul »

Mysql fulltext is included in phpBB3 package :)

User avatar
A_Jelly_Doughnut
Registered User
Posts: 1780
Joined: Wed Jun 04, 2003 4:23 pm

Re: Full text search engine for PhpBB 3.x / PostgreSQL

Post by A_Jelly_Doughnut »

MySQL fulltext was developed by naderman - check out the Developers usergroup here.
A_Jelly_Doughnut

jmpoure
Registered User
Posts: 81
Joined: Sun Mar 11, 2007 9:42 am

Re: Full text search engine for PhpBB 3.x / PostgreSQL

Post by jmpoure »

paulus wrote: Mysql fulltext is included in phpBB3 package :)


So it is only a matter of providing a patch to support PostgreSQL TSeach2 engine.
Does PHPBB 3.x still parses the text of messages to fill-up a word table, even under MySQL?

User avatar
VinDuv
Registered User
Posts: 73
Joined: Wed May 03, 2006 8:10 pm
Location: France

Re: Full text search engine for PhpBB 3.x / PostgreSQL

Post by VinDuv »

jmpoure wrote:
paulus wrote: Mysql fulltext is included in phpBB3 package :)


So it is only a matter of providing a patch to support PostgreSQL TSeach2 engine.
Does PHPBB 3.x still parses the text of messages to fill-up a word table, even under MySQL?

If you use MySQL, you can choose using MySQL fulltext or native fulltext for search. If you use MySQL fulltext, the native index is not updated.

jmpoure
Registered User
Posts: 81
Joined: Sun Mar 11, 2007 9:42 am

Re: Full text search engine for PhpBB 3.x / PostgreSQL

Post by jmpoure »

Wonderful. So adding PostgreSQL full-text engine should be easy.

User avatar
VinDuv
Registered User
Posts: 73
Joined: Wed May 03, 2006 8:10 pm
Location: France

Re: Full text search engine for PhpBB 3.x / PostgreSQL

Post by VinDuv »

jmpoure wrote: Wonderful. So adding PostgreSQL full-text engine should be easy.

Yes. You just have to copy and modify the file includes/search/fulltext_mysql.php, and add some language vars to language/*/acp/search.php :)

Post Reply