PostgreSQL Fulltext Search

Discuss features as they are added to the new version. Give us your feedback. Don't post bug reports, feature requests, support questions or suggestions here.
Forum rules
Discuss features as they are added to the new version. Give us your feedback. Don't post bug reports, feature requests, support questions or suggestions here. Feature requests are closed.
wagnerch
Registered User
Posts: 8
Joined: Sun Jul 22, 2007 7:16 pm

PostgreSQL Fulltext Search

Post by wagnerch »

I know there has been a few discussions about implementing a fulltext module for PostgreSQL. I guess my first question is has anyone actually done it? I should probably mention that I have done it, and it appears to work. It uses tsearch2, I am interested in what others have done.

If you are upgrading from 8.2 to 8.3, you will need to go into the search settings and change the tsearch configuration to english (or whatever language you prefer) and then go into maintenance and drop/create the search index.


Updated archive file: 2/9/2008 -- adds support for PostgreSQL 8.3
Attachments
fulltext_postgres.zip
(8.95 KiB) Downloaded 2374 times
Last edited by wagnerch on Sat Feb 09, 2008 4:30 pm, edited 2 times in total.

User avatar
DavidMJ
Registered User
Posts: 932
Joined: Thu Jun 16, 2005 1:14 am
Location: Great Neck, NY

Re: PostgreSQL Fulltext Search

Post by DavidMJ »

Interesting stuff, it is good to see things like this to start to pop up. Your PCRE check is nice enough to inspire such a change in Olympus ;)
Freedom from fear

code reader
Registered User
Posts: 653
Joined: Wed Sep 21, 2005 3:01 pm

Re: PostgreSQL Fulltext Search

Post by code reader »

very nice, but i have one comment:
from reading the code, it seems that the search-result cache is hugely counter-productive.
every new message posted causes a huge amount of work dedicated solely for handling the cache.
also, every new search spends a lot of effort checking if the search results are already in the cache.
imo, with a reasonable search engine (and, ttbomk, postgress tsearch is significantly better than just "reasonable"), the cache overhead, both in code quantity and processing time is way more expensive than a new search would have been.
besides, there is little evidence to show that consecutive searches are likely to be identical, so there is much doubt that search caching is ever beneficial, regardless of the overhead.

imo, it would be better to throw search-result-caching to the recycle bin. shorter code and better performance all over.

User avatar
naderman
Consultant
Posts: 1727
Joined: Sun Jan 11, 2004 2:11 am
Location: Berlin, Germany
Contact:

Re: PostgreSQL Fulltext Search

Post by naderman »

Search result caching is not only for consecutive identical searches but it makes it unecessary to execute the search again when the user goes to the next page of results ;-) Which however makes me agree even more with your point about "reasonable" search engines (at least for the sphinx one which we use on phpbb.com this is true)

wagnerch
Registered User
Posts: 8
Joined: Sun Jul 22, 2007 7:16 pm

Re: PostgreSQL Fulltext Search

Post by wagnerch »

Attached is a version for RC4, the author_search method has changed.

As for the caching, it would be interesting to see how the caching performs compared against a huge board. Perhaps I should make this an option for fulltext_postgres? I haven't looked at that code too deeply, it is a possibility I broke it anyways :) I know I had to throw away some MySQL specific stuff that was used to determine whether a result was saved or not.
Attachments
fulltext_postgres.zip
(8.76 KiB) Downloaded 2393 times

Chairman Kaga
Registered User
Posts: 1
Joined: Tue Aug 21, 2007 8:01 am

Re: PostgreSQL Fulltext Search

Post by Chairman Kaga »

This is awesome.

I snagged this because I was converting a 2.x board with about 250K posts, and gave up on fulltext_native ever finishing its indexing -- it slowed to 0.3 posts per second and showed no signs of speeding back up.

Well, tsearch2 indexes the whole board in about two minutes, and the searches are blazing fast now.

I hope this gets worked into the core as soon as possible. Not fair for MySQL users to get all the good stuff :)

wagnerch
Registered User
Posts: 8
Joined: Sun Jul 22, 2007 7:16 pm

Re: PostgreSQL Fulltext Search

Post by wagnerch »

I don't think it will get pulled in for the 3.0 release, as far as I know it is in feature freeze. There are a few things that are less than desirable in the current code, such as I am using function based indexes for to_tsvector (because I hate the idea of adding a column for a tsvector, and then indexing the same crap again) and it is assuming that the config map doesn't change -- so if you change the config map then you should delete and recreate the index. I think ideally it should probably permit an index per config map, and I should include the config map in the function based index.

Perhaps I will tweak it a bit more to accommodate PostgreSQL 8.4, since Tsearch2 will be in the core with 8.4.

DrewJensen
Registered User
Posts: 2
Joined: Wed Sep 19, 2007 7:35 pm

Re: PostgreSQL Fulltext Search

Post by DrewJensen »

Hi,

Just thought I'd let you know that in setting up a new board from scratch we decided to go with PostgreSQL and your fulltext_postgres.php mod. ( I emailed you earlier )

After seeing to it that the person doing the install got his head out of his...well..( that was me by the way ) it went without a hitch. You did a fine job of integrating this in the ACP. Well be playing with this on a sandbox XP machine for a few days and then installing on a SUN server running Solaris 10 later this week.

Thanks for you efforts, and well try to give feed back where we can

Drew Jensen
http://user.services.openoffice.org

jmpoure
Registered User
Posts: 81
Joined: Sun Mar 11, 2007 9:42 am

Re: PostgreSQL Fulltext Search

Post by jmpoure »

Dear friends,

My PhpBB 2.x website has 400.000 messages with PostrgeSQL database.
Can you confirm that I can use full-text support.

Just three questions :
* tsearch2 is now fully supported by PhpBB3.x installer, right ?
* Does your MOD work with PostgreSQL 8.3 beta ?
It seems even more easy to use a PostgreSQL beta because tseach2 is integrated.
PostgreSQL betas are usually very stable (as strange as it may seem).
* I hope that the page cache is over. Can you confirm?

If yes, I will migrate to PhpBB 3.x immediately.
I had to tweak PhpBB 2.x code to make it work on large databases.

I hope that this nightmare is over.
Thank you to the PhpBB developers ...

Kind regards,
Jean-Michel

wagnerch
Registered User
Posts: 8
Joined: Sun Jul 22, 2007 7:16 pm

Re: PostgreSQL Fulltext Search

Post by wagnerch »

jmpoure wrote:* tsearch2 is now fully supported by PhpBB3.x installer, right ?
No, this is a customization.
jmpoure wrote:* Does your MOD work with PostgreSQL 8.3 beta ?
It seems even more easy to use a PostgreSQL beta because tseach2 is integrated.
PostgreSQL betas are usually very stable (as strange as it may seem).
Only if 8.3 still maintains compatibility with older versions of tsearch2. I don't think it does, I think some of the table names used by tsearch2 have changed. I planned on testing it, just haven't gotten around to it.
jmpoure wrote:* I hope that the page cache is over. Can you confirm?
What is the page cache? I don't think this has to do with the search.

The only thing I must raise here is that this mod doesn't deal with phrases (words that are grouped by using quotes), in fact the version posted right now blows up and requires a minor mod (on or about line 117):

FROM
// Filter out as above
$split_keywords = preg_replace("#[\n\r\t]+#", ' ', trim(htmlspecialchars_decode($keywords)));

TO

// Filter out as above
$split_keywords = preg_replace("#[\"\n\r\t]+#", ' ', trim(htmlspecialchars_decode($keywords)));

The mod here is to remove the double quotes from the input stream, the longer term effort inside the split_keywords method is to improve the "parser". Because a phrase for tsearch2 needs to be passed in parenthesis with AND operators, such as the phrase "find waldo" would be (find&waldo) for tsearch2, and then I think a post-parser probably needs to be written to toss results that do not match the exact phrase.

Post Reply