differences between fulltext phpbb vs fulltext mysql?

Discussion of general topics related to the new version and its place in the world. Don't discuss new features, report bugs, ask for support, et cetera. Don't use this to spam for other boards or attack those boards!
Forum rules
Discussion of general topics related to the new release and its place in the world. Don't discuss new features, report bugs, ask for support, et cetera. Don't use this to spam for other boards or attack those boards!
code reader
Registered User
Posts: 653
Joined: Wed Sep 21, 2005 3:01 pm

Re: differences between fulltext phpbb vs fulltext mysql?

Post by code reader »

R45 wrote: It depends on the size of your board. If you have a very large board, you might be inclined to use phpbb fulltext and scale down the system, as the mysql fulltext doesn't scale well.
this is exactly opposite to reality.
hoever badly mysql fulltext scales, it is still hugely superior to the way phpbb built-in search mechanism scales.
with a large board (and i dont mean really large. several tens of thousands of messages are enough), with phpbb built-in mechanism, most of the database space will go to the search tables and indexes, and since every post will have to insert dozens of new rows to those tables, the simple operation of posting becomes unbearably slow.
ttbomk, all the large phpbb boards in existence had to either tweak or completely disable the search function (the best and most popular tweak, ttbomk, utilizes mysql fulltext search...).
even phpbb.com, which is "religiously vanilla", could not continue to operate with the built-in search mechanism and had to modify it somewhat.
this is true for the phpbb2 built-in search, but, from looking in the code i think olympus uses exactly the same mechanism.

in conclusion: if you use mysql, there is not a single advantage to using the built-in search over mysql fulltext search, and there are numerous significant disadvantages, especially in storage space and performance.
though this is true with any size board, it becomes much more apparent with large boards.

R45
Registered User
Posts: 236
Joined: Tue Nov 27, 2001 10:42 pm

Re: differences between fulltext phpbb vs fulltext mysql?

Post by R45 »

code reader wrote: this is exactly opposite to reality.
hoever badly mysql fulltext scales, it is still hugely superior to the way phpbb built-in search mechanism scales.
with a large board (and i dont mean really large. several tens of thousands of messages are enough), with phpbb built-in mechanism, most of the database space will go to the search tables and indexes, and since every post will have to insert dozens of new rows to those tables, the simple operation of posting becomes unbearably slow.
ttbomk, all the large phpbb boards in existence had to either tweak or completely disable the search function (the best and most popular tweak, ttbomk, utilizes mysql fulltext search...).
even phpbb.com, which is "religiously vanilla", could not continue to operate with the built-in search mechanism and had to modify it somewhat.
this is true for the phpbb2 built-in search, but, from looking in the code i think olympus uses exactly the same mechanism.

in conclusion: if you use mysql, there is not a single advantage to using the built-in search over mysql fulltext search, and there are numerous significant disadvantages, especially in storage space and performance.
though this is true with any size board, it becomes much more apparent with large boards.
Did you even read my post? I said severly scale down search, ie restrict indexing to this month's posts, only indexing specific forums, indexing posts from posters within a usergroup, restrict posts with only a certain number of words, restrict posts from topics with less than 1 page, etc. These are phpBB based restrictions that you won't have the flexibility to implement using MySQL's fulltext (the customisation of MySQL's fulltext is limited to word character lenghts min/max and a stopwords list). In real world forum administration, one would opt for a scaled down search rather than wasting server resources on areas of the forum that don't need it.

And on that note, I would like to see a fulltext_phpbb with delayed updates, like what I suggested in the phpBB large tweaks thread. Or perhaps I'll write a seperate plugin which does it.

PS. This post was made from practical experience working on forums with millions of posts, not a soupbox.

User avatar
naderman
Consultant
Posts: 1727
Joined: Sun Jan 11, 2004 2:11 am
Location: Berlin, Germany
Contact:

Re: differences between fulltext phpbb vs fulltext mysql?

Post by naderman »

R45 wrote: And on that note, I would like to see a fulltext_phpbb with delayed updates, like what I suggested in the phpBB large tweaks thread. Or perhaps I'll write a seperate plugin which does it.
This is something that's only of use for big boards. Most boards have a smaller size and would rather like to have live-updating of the index. So delayed updates would need a seperate search module. However it should be easy to implement using the search_tidy function to index posts that weren't previously added to the index (this function is called via cron and won't slow down the page itself).

code reader
Registered User
Posts: 653
Joined: Wed Sep 21, 2005 3:01 pm

Re: differences between fulltext phpbb vs fulltext mysql?

Post by code reader »

R45 wrote: .......
PS. This post was made from practical experience working on forums with millions of posts, not a soupbox.
no need to get nasty over a typo (unless, of course, one is genetically inclined).

and all those things you mentioned appear nowhere in your post you so nicely asked if i read.
what you described is a very far cry from phpbb "built-in" search.
if one goes to such a length modifying the code, one might look into a real search engine technology (few of those float around in open-code space), rather than mutilating a half-assed one.

R45
Registered User
Posts: 236
Joined: Tue Nov 27, 2001 10:42 pm

Re: differences between fulltext phpbb vs fulltext mysql?

Post by R45 »

code reader wrote: what you described is a very far cry from phpbb "built-in" search.
if one goes to such a length modifying the code, one might look into a real search engine technology (few of those float around in open-code space), rather than mutilating a half-assed one.
It is using the concept of a built-in (wordlist table, wordmatch entries), just scaling down the scope. In web forums, many times it is not essential for the entire forum to be searchable. Most popular boards contain specific forums dedicated to chat styled topics, word games, random spam, etc. These posts are not worthwhile indexing, and actually clutter results from more meaningful content full results in other areas. By restricting your index to the more useful areas in terms of searching, you save $$$ on resources and provide more accurate results for your users.

It's not about "mutilating a half-assed one". It's called serving your audience properly and not wasting resources, forum budgets are usually low to begin with. And "real" (ie dedicated search systems) either cost a lot or take significant amounts of time to integrate and deploy. Ask BartVB how many different search engine technologies he's played with so far.

Hacking phpBB's search is both cost effective and very viable for some larger forums, always keep your options open.
code reader wrote: no need to get nasty over a typo (unless, of course, one is genetically inclined).
It's not being nasty, it's just getting tired of "coders" who sit back and post on various boards about their opinions on concepts, but do so based on no visible practical knowledge (very much IMO, IIRC, ttbomk, AFAIK). In the real world, there's never just one right solution to a problem. Different solutions work for different cases, all depending on the circumstances. I welcome you to post specifics of something you mentioned earlier to prove me wrong. You said people with large forums tweak MySQL's fulltext to work. What do they do exactly... besides increasing the number of stopwords which is done in any search system? And you said that people should just use a real search engine. Well I invite you to show us how easy it is to integrate and display results properly, since it is apparently (implied) easier than scaling down phpBB's search system...

code reader
Registered User
Posts: 653
Joined: Wed Sep 21, 2005 3:01 pm

Re: differences between fulltext phpbb vs fulltext mysql?

Post by code reader »

R45 wrote: .....(lots of stuff)
You said people with large forums tweak MySQL's fulltext to work.
no i didnt.
i said they either disable seach altogether, or tweak phpbb built-in search.
the best "tweak" i know of is completely disable phpbb search, getting rid of the 2 word-search tables, and instead use mysql fulltext search.
it is less than trivial mod, but it improves space and performance tenfold.
the good people developing phpbb3 have built this option into the code, and all i said is that if you use mysql, there is no one good reason to remain with the phpbb search.

and if you are tired discussing things on discussion boards with "coders", as you so nicely put it, i doubt very much that anyone actually forces you to.

User avatar
Cheater512
Registered User
Posts: 245
Joined: Thu Mar 23, 2006 1:29 am
Location: Brisbane, Australia
Contact:

Re: differences between fulltext phpbb vs fulltext mysql?

Post by Cheater512 »

I dont see why there is a arguement about this. :?

MySQL's full text search capabilities are basically the same as phpBB's. The difference is that MySQL has it built in to the db thus needing less processing while phpBB does it in php requiring more code and SQL queries.

They both dont scale well. Thats just how search is.
I'd imagine that MySQL's search would last a little bit longer than phpBB's search without modifications.

If your running a small forum then stick with phpBB's default search. No real reason to worry about it.
If your running a large forum then you probally know what is the best option for you.

Xore
Registered User
Posts: 80
Joined: Mon Jul 21, 2003 11:44 pm
Location: The desert
Contact:

Re: differences between fulltext phpbb vs fulltext mysql?

Post by Xore »

code reader wrote: and if you are tired discussing things on discussion boards with "coders", as you so nicely put it, i doubt very much that anyone actually forces you to.
Image

Edit: It's also obvious that you either didn't read, or completely failed to understand his post, in it's entirety.

R45
Registered User
Posts: 236
Joined: Tue Nov 27, 2001 10:42 pm

Re: differences between fulltext phpbb vs fulltext mysql?

Post by R45 »

Cheater512 wrote: MySQL's full text search capabilities are basically the same as phpBB's. The difference is that MySQL has it built in to the db thus needing less processing while phpBB does it in php requiring more code and SQL queries.

They both dont scale well. Thats just how search is.
I'd imagine that MySQL's search would last a little bit longer than phpBB's search without modifications.
That's not being argued. The problem is MySQL's fulltext does not allow you to exclude posts from the index, so eventually you hit a performance wall unlike having the flexibility of scaling down the search with phpBB's fulltext. I would say small and medium boards should use MySQL's fulltext.

And as for the scalability of phpBB's fulltext, indeed it is horrible. First though, cronjobing updates to the tables yields great performance benifits by not slowing down post/edit/delete times. I also disagree with some people on the issue of the viability of SQL based searching in the long term. I experimented for one site reversing a concept usually used by forum searches, bloat the index and spread the data out. I made a post about it in the tweaks forum:

http://www.phpbb.com/phpBB/viewtopic.ph ... 94#2002694" target="_blank
R45 wrote: Search - I've reimplemented full searching, using an SQL method. basically expanding on the method phpBB uses but intead spreading the index amongst literally hundreds of tables.

Here's the basic layout of the search tables

Code: Select all

Search Tables
	
		[search_words] - All non alphabetical words 
		[search_words_a] - All a words followed by a non alphabetic character
		...
			word_id
			word_text
			match_table
			match_count


		[search_matches_X]
		[search_matches_a_X]
			[search_matches_aa_X]
			[search_matches_ab_X]
			...
		...
			match_id
			word_id
			forum_id
			post_id
			topic_id
			poster_id
			post_time
			match_type
			num_matches	
		
		[search_pending]
			post_id
			search_tables
		
		[search_results]
			result_id
			user_id
			user_ip
			session_id
			search_data
			search_results
			search_time
So basically, the words table was split into multiple table, one for each word. The search matches make up the bulk of the tables, the X represents dynamically created tables. When tables reach a threshold, a new one is created and any new words are assigned the new table. Basically the words table entries have a link to the match table that words entries will be in. I use a threshold of 50,000 entries each (because words remain assigned to tables, therefore the match tables will keep increasing as new posts are indexed).

Maintenance scripts run weekly which monitor match table that are getting bloated, and move entries to new tables if needed.

I've also stores additional post data with match entries, like the forum/topic/poster ids. This allows a significant amount of logic to be added when pulling matches, and limits the amount of data grabbing done on the results side.

The pending table stores when posts have been edited or deleted, and scripts are run nightly which reprocesses those posts (as well as indexing new posts, which are simply flagged by a search_status column in the posts table).

The results table has some additional flexibility as well. Before a search is processed, the search script builds an MD5 based on the query entered (and permission data which of course affects what posts you cannot see) and checks to see if recent searches were made and pulls the compiled results from cache.

This system has been more viable than I imagined. The main negative of it is the significantly increased disk space (Database size literally trippled). This however is minor as it's merely disk usage and longer backups (which can be remedied by not backing up the search tables or splitting all of them to a new database on a seperate box).
I have this system currently running at http://forum.americasarmy.com" target="_blank . There's a lot of work still to be done on it, the sorting of results needs to be optimised further and I some improvements need to search results caching. But it works in its implementation, at the sacrifice of disk space which is MUCH cheaper than setting up an alternative box.
code reader wrote: said is that if you use mysql, there is no one good reason to remain with the phpbb search.
lol, you still say there is "no good reason", but I outlined several real world scenarios and you've yet to actually answer my post with anything concrete.

User avatar
Cheater512
Registered User
Posts: 245
Joined: Thu Mar 23, 2006 1:29 am
Location: Brisbane, Australia
Contact:

Re: differences between fulltext phpbb vs fulltext mysql?

Post by Cheater512 »

There is one problem with that code. It destroys MySQL's ability to cache the search tables.
There are thousands of tables so it cant store any one in the memory or even just keep the file handle open as it doesnt know which one will be used.

Post Reply