Good patch as usual. Works on our 500.000 posts forum.
Could you comment on your technical choices?
Why do you let Php parse messages instead of using PostgreSQL full-text parsing function ?
Basically,
PostgreSQL full-text search documentation explains how parsing queries work:
SELECT plainto_tsquery('english', 'The Fat Rats');
plainto_tsquery
-----------------
'fat' & 'rat'
Using PostgreSQL parsing functions seem more straigthforward :
This would eliminate empty words like "the" in english, offer better support in future PostgreSQL updates.
Also, I suspect a better support on PostgreSQL side for accentuated languages.
Let me give an example:
In French, real-estate is "immobilier". Immobilier is a noun, but it can also be an adjective.
Real estate bubble translates "Bulle Immobili
ère" because bulle (bubble) is feminin gender.
Real estate crash translates "Krach immobili
er" because krach (crash) is masculin gender.
So how does PostgreSQL handle search ?
SELECT plainto_tsquery('french', 'krach immobilier');
'krach' & 'immobili'
SELECT plainto_tsquery('french', 'bulle immobilière');
'bull' & 'immobili'
So in the case of 'immobilier', PostgreSQL will seach for all occurences (immobilier, immobilière, immobilières, immobiliers), in masculin, feminin, plurar forms using "immobili". For this PostgreSQL has some grammar knowledge and is able to use dictionaries.
I tried to look for "immobilier" on our forum. It only returns matches for "immobilier" and not "immobilière".
In the case of my forum,
I probably loose 50% search matches, which is enough to decide to let PostgreSQL do the parsing work.
In my mind, the very minimum would be to pass the initial seach query to PostgreSQL using
plainto_tsquery and then trim the result removing ' and &.
Then you can remove a lot of your code on the Php parsing side and let PostgreSQL do the job.
The most simple the patch, the better
What is your point of view?
Otherwize, this patch should be integrated into PhpBB next release.
I really recommend this patch to any PostgreSQL 8.3 user with medium to very large forum.
It really cuts down CPU usage as PostgreSQL does most processing itself.
Thanks for your work.
Kind regards,
Jean-Michel