PostgreSQL Fulltext Search

Discuss features as they are added to the new version. Give us your feedback. Don't post bug reports, feature requests, support questions or suggestions here.
Forum rules
Discuss features as they are added to the new version. Give us your feedback. Don't post bug reports, feature requests, support questions or suggestions here. Feature requests are closed.
jmpoure
Registered User
Posts: 81
Joined: Sun Mar 11, 2007 9:42 am

Re: PostgreSQL Fulltext Search

Post by jmpoure »

Thank you wagnerch for your detailed answer.

This is a very clean patch.
Only a class and a few SQL entries.

I will try to port your patch to PostgreSQL 8.3 integrated full-text engine.
Will report back soon. I hope that we can stay in touch.

Very kind regards,
Jean-Michel
Last edited by jmpoure on Fri Nov 16, 2007 2:50 pm, edited 3 times in total.

wagnerch
Registered User
Posts: 8
Joined: Sun Jul 22, 2007 7:16 pm

Re: PostgreSQL Fulltext Search

Post by wagnerch »

Sounds good, if you have an ideas for the "exact phrase" search then I think that would be handy as well.

code reader
Registered User
Posts: 653
Joined: Wed Sep 21, 2005 3:01 pm

Re: PostgreSQL Fulltext Search

Post by code reader »

my idea regarding the "exact phrase":

-- get the search string from the user
-- sanitize it by escaping single quotes only
-- pass it as is to the search engine through a query, using single quotes.

in other words: completely skip the "split keywords" step. this step only makes sense for the "native" search mode, which requires single-word search. mysql and postgres fulltext do the search expression parsing themselves, and it makes the most sense to let the user build the search expression using the rules of the method used (mysql or postgres), instead of some arbitrary and incomplete set of rules which lacks, for instance, phrases...

if you use this logic, you might also want to edit the language files and change the explanation shown to the user describing how to build the search expression.
this will make your code shorter (the "split keywords" function is long, ugly, and cumbersome), faster, and better.

wagnerch
Registered User
Posts: 8
Joined: Sun Jul 22, 2007 7:16 pm

Re: PostgreSQL Fulltext Search

Post by wagnerch »

code reader wrote:my idea regarding the "exact phrase":

-- get the search string from the user
-- sanitize it by escaping single quotes only
-- pass it as is to the search engine through a query, using single quotes.

in other words: completely skip the "split keywords" step. this step only makes sense for the "native" search mode, which requires single-word search. mysql and postgres fulltext do the search expression parsing themselves, and it makes the most sense to let the user build the search expression using the rules of the method used (mysql or postgres), instead of some arbitrary and incomplete set of rules which lacks, for instance, phrases...
tsearch2 doesn't appear to support exact phrases as you suggest. See http://archives.postgresql.org/pgsql-sq ... g00165.php, where the author states it has not been implemented yet.

I think it would make sense to maintain the same search operators that already exist in phpBB, they are fairly standard and most users are familiar with them. I can't see making a user use "&" for AND, "!" for NOT, etc. And there still needs to be some sort of post-processing to address the exact phrase matching for PostgreSQL. There is a piece of code out there that does all of this for me, I just didn't want to pull that relatively large chunk of code into the mix. See http://www.digitalstratum.com/oss/fts_parser.

jmpoure
Registered User
Posts: 81
Joined: Sun Mar 11, 2007 9:42 am

Re: PostgreSQL Fulltext Search

Post by jmpoure »

wagnerch wrote:Attached is a version for RC4, the author_search method has changed.

As for the caching, it would be interesting to see how the caching performs compared against a huge board. Perhaps I should make this an option for fulltext_postgres? I haven't looked at that code too deeply, it is a possibility I broke it anyways :) I know I had to throw away some MySQL specific stuff that was used to determine whether a result was saved or not.
Dear Wagnerch,

I am currently testing this code.
Any news or updates?

jmpoure
Registered User
Posts: 81
Joined: Sun Mar 11, 2007 9:42 am

Re: PostgreSQL Fulltext Search

Post by jmpoure »

I could basically install your patch, thanks.
But some problems exist on non-C databases, when using locales:

Code: Select all

SQL ERROR [ postgres ]
ERREUR: could not find tsearch config by locale []
I am trying to install the French dictionary and will get back.

jmpoure
Registered User
Posts: 81
Joined: Sun Mar 11, 2007 9:42 am

Re: PostgreSQL Fulltext Search

Post by jmpoure »

Okay, problems resolved.

Phpbb 3.0 database is UTF-8
So there should be a dictionnary installed.
In phpBB 3.0 code, there should be a test whether tsearch2 dictionary is installed for the current locale.

I downloaded dictionnaries on tseach2 homepage.
You should only install UTF-8 dictionnaries.

Adapt the script to dictionnary name and location.

Code: Select all

/usr/share/postgresql/8.2/contrib/french_utf8.*
Then I installed them using this code:

Code: Select all

INSERT INTO pg_ts_cfg VALUES ('default_french', 'default', 'fr_FR.UTF-8');

INSERT INTO pg_ts_dict VALUES ('default_french', 'spell_init(internal)', 'DictFile="/usr/share/postgresql/8.2/contrib/french_utf8.dict",AffFile="/usr/share/postgresql/8.2/contrib/french_utf8.aff",StopFile="/usr/share/postgresql/8.2/contrib/french_utf8.stop"', 'spell_lexize(internal,internal,integer)', NULL);


INSERT INTO pg_ts_cfgmap VALUES ('default_french', 'email', '{simple}');
INSERT INTO pg_ts_cfgmap VALUES ('default_french', 'file', '{simple}');
INSERT INTO pg_ts_cfgmap VALUES ('default_french', 'float', '{simple}');
INSERT INTO pg_ts_cfgmap VALUES ('default_french', 'host', '{simple}');
INSERT INTO pg_ts_cfgmap VALUES ('default_french', 'hword', '{simple}');
INSERT INTO pg_ts_cfgmap VALUES ('default_french', 'int', '{simple}');
INSERT INTO pg_ts_cfgmap VALUES ('default_french', 'lhword', '{default_french}');
INSERT INTO pg_ts_cfgmap VALUES ('default_french', 'lpart_hword', '{default_french}');
INSERT INTO pg_ts_cfgmap VALUES ('default_french', 'nlhword', '{simple}');
INSERT INTO pg_ts_cfgmap VALUES ('default_french', 'nlpart_hword', '{simple}');
INSERT INTO pg_ts_cfgmap VALUES ('default_french', 'part_hword', '{simple}');
INSERT INTO pg_ts_cfgmap VALUES ('default_french', 'sfloat', '{simple}');
INSERT INTO pg_ts_cfgmap VALUES ('default_french', 'uint', '{simple}');
INSERT INTO pg_ts_cfgmap VALUES ('default_french', 'uri', '{simple}');
INSERT INTO pg_ts_cfgmap VALUES ('default_french', 'url', '{simple}');
INSERT INTO pg_ts_cfgmap VALUES ('default_french', 'version', '{simple}');
INSERT INTO pg_ts_cfgmap VALUES ('default_french', 'word', '{default_french}');
INSERT INTO pg_ts_cfgmap VALUES ('default_french', 'nlword', '{default_french}');
INSERT INTO pg_ts_cfgmap VALUES ('default_french', 'lword', '{default_french,simple}');
Now, the only limitation is that I cannot seach values with accents (é).
is this a problem with the lexerize code in search?

In next PostgreSQL 8.3 version, tseach2 will be integrated in core and there will be no need to install dictionaries.

jmpoure
Registered User
Posts: 81
Joined: Sun Mar 11, 2007 9:42 am

Re: PostgreSQL Fulltext Search

Post by jmpoure »

Small bug : when clicking on "View your messages":

Code: Select all

Erreur Générale
SQL ERROR [ postgres ]

ERREUR: La colonne « t.topic_last_post_time » doit apparaître dans la clause GROUP BY ou doit être utilisé dans une fonction d'aggrégat []

I will report later on.

jmpoure
Registered User
Posts: 81
Joined: Sun Mar 11, 2007 9:42 am

Re: PostgreSQL Fulltext Search

Post by jmpoure »

fulltext_postgressearch.php should be, line 590 and below

Code: Select all

		      $sql = "SELECT t.topic_id
				FROM " . $sql_sort_table . TOPICS_TABLE . ' t '. "
                                LEFT JOIN " . POSTS_TABLE . " p 
                                ON t.topic_id = p.topic_id 
				WHERE $sql_author
					$sql_topic_id
					$sql_firstpost
					$m_approve_fid_sql
					$sql_fora
					$sql_sort_join
					$sql_time
				ORDER BY $sql_sort";
			$field = 'topic_id';

DrewJensen
Registered User
Posts: 2
Joined: Wed Sep 19, 2007 7:35 pm

Re: PostgreSQL Fulltext Search

Post by DrewJensen »

Howdy,

Couple of questions and maybe some help from anyone.

First - currently running a phpBB 3.0 board against Postgres 8.2.4 - using the Native-Full text search ( not this patch ) - as I figured 8.3 would be out soon.

OK - installed 8.3 RC1 lastnight on a test server and before i could get to testing this patch I received an error report from one of our board users - specifically using an exclusion search string ( ie "form -close") generates the following SQL statment and error:
SELECT COUNT(DISTINCT p.post_id) AS total_results FROM phpbb_en_search_wordmatch m0, phpbb_en_posts p LEFT JOIN phpbb_en_search_wordmatch m1 ON (m1.word_id = 17255 AND m1.post_id = m0.post_id) WHERE m0.post_id = p.post_id AND m0.word_id = 18017 AND m1.word_id IS NULL

Error message
ERROR: invalid reference to FROM-clause entry for table "m0" at character 182 HINT: There is an entry for table "m0", but it cannot be referenced from this part of the query. []
I get the same error against ps 8.3 RC1 with phpBB native search.

So - Tried a similar search on a forum I know uses this patch ( postgresqlforums.com ) with a search ( "vacuum -thresholds" )

Ok , no errors - but the exclusion part of the search is ignored

I take it this is by design - the exclusions function is just not part of the patch at the moment?

Drew

Sorry for being a little OT - now off to open a defect for native full text search.

Post Reply