use of mysql Fulltext search: problems with indexes.

Discussion of general topics related to the new version and its place in the world. Don't discuss new features, report bugs, ask for support, et cetera. Don't use this to spam for other boards or attack those boards!
Forum rules
Discussion of general topics related to the new release and its place in the world. Don't discuss new features, report bugs, ask for support, et cetera. Don't use this to spam for other boards or attack those boards!
Post Reply
code reader
Registered User
Posts: 653
Joined: Wed Sep 21, 2005 3:01 pm

use of mysql Fulltext search: problems with indexes.

Post by code reader »

[WARNING: long, and rather technical post]

1)
when you select fulltext_mysql as the search option for your forum, olympus creates 3 fulltext indexes:
  • post_subject
  • post_text
  • post_content, which combines post_subject and post_text.
it has to do that, because of the way the fulltext mechanism works in mysql: it can only be used if the there is an index which is exactly identical to the search domain.

let's ignore the first index for now, and look only on the last two: (post_text) and (post_subject, post_text)
having these two indexes means is that the main content of your board, ie. the text of all the posts is indexed twice.
this has some adverse effects:
  • when posting, or when deleting, mysql has to update 3 fulltext indexes (there are many other indexes that get updated, but those are "simple" indexes. the expensive ones are the "fulltext" indexes). once your board passes a certain size, updating those indexes is the only meaningful factor determining posting performance
  • situation is even worse for editing and deleting posts. again, having 2 indexes doubles the time it takes to update the posts table (which, btw, stays locked while the update is taking place). it also almost doubles "restore from backup" times.
  • typically, the posts table dominates the database in terms of size consumed. each of these two indexes is roughly the size of the table itself.
    this means that the posts table index, when you select to use fulltext_mysql, weighs more than twice the weight of the table. globally speaking, this increase the whole database size by close to 50%. (this does not affect backup time or backup file size, because the indexes are not backed up)
what is the benefit you gain for these very significant adverse effects?
the ability to search on "message text only".
well, i don't think it worth it.

What can be done?
there are 2 simple solutions to this problem.
in both of them, the first step is to remove the "post_text" index, and leave only (post_subject, post_text), and (post_subject) indexes.

in the first, and simplest solution, we simply remove the option to search on "message text only" for boards which chose fulltext_mysql for their search mechanism. small price to pay: you can still search your posts text, but the search will always retrieve posts which have the search terms in their subject, in addition to the posts which has it in the post body. (and those who chose fulltext_mysql receive something very nice in return: the ability to search for phrase).
the 2nd solution is to use a slightly more complex query when the user selects "message text only":
replace
where match (post_text) against ($search_query in boolean mode)
with
where match (post_subject, post_text) against ('$search_query' in boolean mode) AND match (post_text) against ('$search_query' in boolean mode)

this way, mysql uses the index to find all posts where the search query appears either in the subject or text, using the index, and then refines the find list with the second match, applied to each post found in the first part. slightly more costly in search performance (only when the "post text only" option is used), but absolutely worth it, by halving post time.

personally, i go with the first option.
[this part was based on information posted by pppjunk in phpbb.com]


2)
the post_subject index is only used for searches with the option "topic titles only". however, the index contains the post_subject from all posts, not just topics, so when this option is selected, the where part of the query becomes something like:
where match (post_subject) against ('$search_query' in boolean mode) AND p.post_id = t.first_post_id

this makes little sense. the topic_title is stored in the topics table, and it would be much simpler and much more straightforward to drop the posts.post_subject fulltext index, and instead create topic.topic_title fulltext index, and then do the search with:
where match (topic_title) against ('$search_query' in boolean mode
both the existing index and the one i suggest exist solely for this one type of query ("topic title only"). the one i suggest will be smaller, will save index updates on replies and edits (unless the user actually changes the topic title), and the search query is simpler and faster. there is no down-side to this change.

to summarize: instead of the 3 fulltext indexes in use today (listed at the beginning of this long post), i claim that the following fulltext indexes should be used:
in posts table: (post_subject, post_text)
in topics table: (topic_title).

User avatar
Balint
Registered User
Posts: 16
Joined: Thu Jan 01, 2004 1:37 am

Re: use of mysql Fulltext search: problems with indexes.

Post by Balint »

Hi!

After reading a lot of threads and examining my current database situation I'm wondering why noone has payed any attention to this thread. If I understand correctly the devs don't want to drop their view of things, but I hope even they admit that this measure is useful for large boards. So what I'm asking after this long time is simply a small MOD which summarizes the changes e.g. in fulltext_mysql.php, search_body.html and maybe search.php itself.

Thank you for your efforts, code reader.


Greets,
Bálint
I am free because I know that I alone am morally responsible for everything I do.
Robert A. Heinlein

code reader
Registered User
Posts: 653
Joined: Wed Sep 21, 2005 3:01 pm

Re: use of mysql Fulltext search: problems with indexes.

Post by code reader »

sorry for the late response.
i actually had a mod around the time the original post was written, but, iirc, the code to the core changed enough so that my mod was not viable any more. i don't have it any more... :(
if you are willing to deal with the DB changes manually, the change tio the code is small enough.
i am posting an UNTESTED version here: this is a quick hack, and may even contain syntax errors. 100% untested.

Code: Select all

Index: fulltext_mysql.php
===================================================================
--- fulltext_mysql.php    (revision 8553)
+++ fulltext_mysql.php    (working copy)
@@ -392,6 +392,7 @@
         }
 
         // Build some display specific sql strings
+        $extra_match = "";
         switch ($fields)
         {
             case 'titleonly':
@@ -400,17 +401,14 @@
                 $join_topic = true;
             break;
 
-            case 'msgonly':
-                $sql_match = 'p.post_text';
-                $sql_match_where = '';
-            break;
-
             case 'firstpost':
                 $sql_match = 'p.post_subject, p.post_text';
                 $sql_match_where = ' AND p.post_id = t.topic_first_post_id';
                 $join_topic = true;
             break;
 
+            case 'msgonly':
+                $extra_match = "AND MATCH (p.post_text) AGAINST ('" . $db->sql_escape(htmlspecialchars_decode($this->search_query)) . "' IN BOOLEAN MODE)";
             default:
                 $sql_match = 'p.post_subject, p.post_text';
                 $sql_match_where = '';
@@ -448,7 +446,7 @@
         $sql = "SELECT $sql_select
             FROM $sql_from$sql_sort_table" . POSTS_TABLE . " p
             WHERE MATCH ($sql_match) AGAINST ('" . $db->sql_escape(htmlspecialchars_decode($this->search_query)) . "' IN BOOLEAN MODE)
-                $sql_where_options
+                $extra_match $sql_where_options
             ORDER BY $sql_sort";
         $result = $db->sql_query_limit($sql, $config['search_block_size'], $start);
 
(this is a patch, and can be applied to the code using the "patch" utility. if you do it by hand, be careful... :) )

then, of course, you'll have to get into the db and remove the post_text fulltext index
(alter table TABLE_NAME delete index INDEX_NAME)

again, this is completely untested. i don't know if it works, and if so how well.
i agree with you that the dev's response (or lack thereof) to this issue show somewhat less than completely serious attitude towards the "fulltext mysql" search mode.

Post Reply