I know there has been a few discussions about implementing a fulltext module for PostgreSQL. I guess my first question is has anyone actually done it? I should probably mention that I have done it, and it appears to work. It uses tsearch2, I am interested in what others have done.
If you are upgrading from 8.2 to 8.3, you will need to go into the search settings and change the tsearch configuration to english (or whatever language you prefer) and then go into maintenance and drop/create the search index.
Updated archive file: 2/9/2008 -- adds support for PostgreSQL 8.3
PostgreSQL Fulltext Search
Forum rules
Discuss features as they are added to the new version. Give us your feedback. Don't post bug reports, feature requests, support questions or suggestions here. Feature requests are closed.
Discuss features as they are added to the new version. Give us your feedback. Don't post bug reports, feature requests, support questions or suggestions here. Feature requests are closed.
PostgreSQL Fulltext Search
- Attachments
-
- fulltext_postgres.zip
- (8.95 KiB) Downloaded 2454 times
Last edited by wagnerch on Sat Feb 09, 2008 4:30 pm, edited 2 times in total.
Re: PostgreSQL Fulltext Search
Interesting stuff, it is good to see things like this to start to pop up. Your PCRE check is nice enough to inspire such a change in Olympus
Freedom from fear
-
- Registered User
- Posts: 653
- Joined: Wed Sep 21, 2005 3:01 pm
Re: PostgreSQL Fulltext Search
very nice, but i have one comment:
from reading the code, it seems that the search-result cache is hugely counter-productive.
every new message posted causes a huge amount of work dedicated solely for handling the cache.
also, every new search spends a lot of effort checking if the search results are already in the cache.
imo, with a reasonable search engine (and, ttbomk, postgress tsearch is significantly better than just "reasonable"), the cache overhead, both in code quantity and processing time is way more expensive than a new search would have been.
besides, there is little evidence to show that consecutive searches are likely to be identical, so there is much doubt that search caching is ever beneficial, regardless of the overhead.
imo, it would be better to throw search-result-caching to the recycle bin. shorter code and better performance all over.
from reading the code, it seems that the search-result cache is hugely counter-productive.
every new message posted causes a huge amount of work dedicated solely for handling the cache.
also, every new search spends a lot of effort checking if the search results are already in the cache.
imo, with a reasonable search engine (and, ttbomk, postgress tsearch is significantly better than just "reasonable"), the cache overhead, both in code quantity and processing time is way more expensive than a new search would have been.
besides, there is little evidence to show that consecutive searches are likely to be identical, so there is much doubt that search caching is ever beneficial, regardless of the overhead.
imo, it would be better to throw search-result-caching to the recycle bin. shorter code and better performance all over.
Re: PostgreSQL Fulltext Search
Search result caching is not only for consecutive identical searches but it makes it unecessary to execute the search again when the user goes to the next page of results Which however makes me agree even more with your point about "reasonable" search engines (at least for the sphinx one which we use on phpbb.com this is true)
Re: PostgreSQL Fulltext Search
Attached is a version for RC4, the author_search method has changed.
As for the caching, it would be interesting to see how the caching performs compared against a huge board. Perhaps I should make this an option for fulltext_postgres? I haven't looked at that code too deeply, it is a possibility I broke it anyways I know I had to throw away some MySQL specific stuff that was used to determine whether a result was saved or not.
As for the caching, it would be interesting to see how the caching performs compared against a huge board. Perhaps I should make this an option for fulltext_postgres? I haven't looked at that code too deeply, it is a possibility I broke it anyways I know I had to throw away some MySQL specific stuff that was used to determine whether a result was saved or not.
- Attachments
-
- fulltext_postgres.zip
- (8.76 KiB) Downloaded 2480 times
-
- Registered User
- Posts: 1
- Joined: Tue Aug 21, 2007 8:01 am
Re: PostgreSQL Fulltext Search
This is awesome.
I snagged this because I was converting a 2.x board with about 250K posts, and gave up on fulltext_native ever finishing its indexing -- it slowed to 0.3 posts per second and showed no signs of speeding back up.
Well, tsearch2 indexes the whole board in about two minutes, and the searches are blazing fast now.
I hope this gets worked into the core as soon as possible. Not fair for MySQL users to get all the good stuff
I snagged this because I was converting a 2.x board with about 250K posts, and gave up on fulltext_native ever finishing its indexing -- it slowed to 0.3 posts per second and showed no signs of speeding back up.
Well, tsearch2 indexes the whole board in about two minutes, and the searches are blazing fast now.
I hope this gets worked into the core as soon as possible. Not fair for MySQL users to get all the good stuff
Re: PostgreSQL Fulltext Search
I don't think it will get pulled in for the 3.0 release, as far as I know it is in feature freeze. There are a few things that are less than desirable in the current code, such as I am using function based indexes for to_tsvector (because I hate the idea of adding a column for a tsvector, and then indexing the same crap again) and it is assuming that the config map doesn't change -- so if you change the config map then you should delete and recreate the index. I think ideally it should probably permit an index per config map, and I should include the config map in the function based index.
Perhaps I will tweak it a bit more to accommodate PostgreSQL 8.4, since Tsearch2 will be in the core with 8.4.
Perhaps I will tweak it a bit more to accommodate PostgreSQL 8.4, since Tsearch2 will be in the core with 8.4.
-
- Registered User
- Posts: 2
- Joined: Wed Sep 19, 2007 7:35 pm
Re: PostgreSQL Fulltext Search
Hi,
Just thought I'd let you know that in setting up a new board from scratch we decided to go with PostgreSQL and your fulltext_postgres.php mod. ( I emailed you earlier )
After seeing to it that the person doing the install got his head out of his...well..( that was me by the way ) it went without a hitch. You did a fine job of integrating this in the ACP. Well be playing with this on a sandbox XP machine for a few days and then installing on a SUN server running Solaris 10 later this week.
Thanks for you efforts, and well try to give feed back where we can
Drew Jensen
http://user.services.openoffice.org
Just thought I'd let you know that in setting up a new board from scratch we decided to go with PostgreSQL and your fulltext_postgres.php mod. ( I emailed you earlier )
After seeing to it that the person doing the install got his head out of his...well..( that was me by the way ) it went without a hitch. You did a fine job of integrating this in the ACP. Well be playing with this on a sandbox XP machine for a few days and then installing on a SUN server running Solaris 10 later this week.
Thanks for you efforts, and well try to give feed back where we can
Drew Jensen
http://user.services.openoffice.org
Re: PostgreSQL Fulltext Search
Dear friends,
My PhpBB 2.x website has 400.000 messages with PostrgeSQL database.
Can you confirm that I can use full-text support.
Just three questions :
* tsearch2 is now fully supported by PhpBB3.x installer, right ?
* Does your MOD work with PostgreSQL 8.3 beta ?
It seems even more easy to use a PostgreSQL beta because tseach2 is integrated.
PostgreSQL betas are usually very stable (as strange as it may seem).
* I hope that the page cache is over. Can you confirm?
If yes, I will migrate to PhpBB 3.x immediately.
I had to tweak PhpBB 2.x code to make it work on large databases.
I hope that this nightmare is over.
Thank you to the PhpBB developers ...
Kind regards,
Jean-Michel
My PhpBB 2.x website has 400.000 messages with PostrgeSQL database.
Can you confirm that I can use full-text support.
Just three questions :
* tsearch2 is now fully supported by PhpBB3.x installer, right ?
* Does your MOD work with PostgreSQL 8.3 beta ?
It seems even more easy to use a PostgreSQL beta because tseach2 is integrated.
PostgreSQL betas are usually very stable (as strange as it may seem).
* I hope that the page cache is over. Can you confirm?
If yes, I will migrate to PhpBB 3.x immediately.
I had to tweak PhpBB 2.x code to make it work on large databases.
I hope that this nightmare is over.
Thank you to the PhpBB developers ...
Kind regards,
Jean-Michel
Re: PostgreSQL Fulltext Search
No, this is a customization.jmpoure wrote:* tsearch2 is now fully supported by PhpBB3.x installer, right ?
Only if 8.3 still maintains compatibility with older versions of tsearch2. I don't think it does, I think some of the table names used by tsearch2 have changed. I planned on testing it, just haven't gotten around to it.jmpoure wrote:* Does your MOD work with PostgreSQL 8.3 beta ?
It seems even more easy to use a PostgreSQL beta because tseach2 is integrated.
PostgreSQL betas are usually very stable (as strange as it may seem).
What is the page cache? I don't think this has to do with the search.jmpoure wrote:* I hope that the page cache is over. Can you confirm?
The only thing I must raise here is that this mod doesn't deal with phrases (words that are grouped by using quotes), in fact the version posted right now blows up and requires a minor mod (on or about line 117):
FROM
// Filter out as above
$split_keywords = preg_replace("#[\n\r\t]+#", ' ', trim(htmlspecialchars_decode($keywords)));
TO
// Filter out as above
$split_keywords = preg_replace("#[\"\n\r\t]+#", ' ', trim(htmlspecialchars_decode($keywords)));
The mod here is to remove the double quotes from the input stream, the longer term effort inside the split_keywords method is to improve the "parser". Because a phrase for tsearch2 needs to be passed in parenthesis with AND operators, such as the phrase "find waldo" would be (find&waldo) for tsearch2, and then I think a post-parser probably needs to be written to toss results that do not match the exact phrase.