Search System Discussion

Discussion of general topics related to the new version and its place in the world. Don't discuss new features, report bugs, ask for support, et cetera. Don't use this to spam for other boards or attack those boards!
Forum rules
Discussion of general topics related to the new release and its place in the world. Don't discuss new features, report bugs, ask for support, et cetera. Don't use this to spam for other boards or attack those boards!
User avatar
EXreaction
Registered User
Posts: 1555
Joined: Sat Sep 10, 2005 2:15 am

Search System Discussion

Post by EXreaction »

This is something I've been thinking about for a while but never bothered really to talk about it.

The current search system is extremely limited in that it only allows post text and titles to be searchable. I would really like to see something setup in the future to be able to search everything, post text, titles, user data, attachment name/desc, polls. Basically anything that is a text field. Also if that were to happen I'd really like to allow other mods to hook into it for their own custom pages (like the User Blog Mod for example).

I am not sure how easy it would be to do, but if it was made it should not be too difficult to implement to every field (assuming you use the same function for every field to parse bbcode).

Also, it would be great if you could search your own PM's too. Not sure how easy it would be to do that though. :|

User avatar
A_Jelly_Doughnut
Registered User
Posts: 1780
Joined: Wed Jun 04, 2003 4:23 pm

Re: Search System Discussion

Post by A_Jelly_Doughnut »

PM search could probably be done with a simple trawl of the table.

First, find all PMs that belong to a user, then run a LIKE query. Not going to be a speed demon, but it shouldn't be too bad -- I don't imagine most users have that many PMs.
A_Jelly_Doughnut

User avatar
EXreaction
Registered User
Posts: 1555
Joined: Sat Sep 10, 2005 2:15 am

Re: Search System Discussion

Post by EXreaction »

That is one possibility.

Right now I think the easiest way to do this would be to have separate search systems which all can be called from the search page. That would at least make it appear to all search through one place and would allow things to stay flexible enough to be able to more easily control the data that is outputted.

Though I would really like to see a single table (in the case of native search backend) which stores all of the search information. Along with the information it could store something like mode, which, once the search does it's thing it calls whatever system handles that mode and checks to see if the user has permission to view it, etc.

Also, I think when somebody does a search, it should cache all of the results (probably put in a limit of 500/1000 results or something) exactly as required for output and store it for 30 minutes (every time they search the same thing, like when using the next button it would update the time again). This should also generate an ID and use that for the url instead of trying to keep all of the words in the url from page to page (which has caused some issues).

Instead of the radio boxes on advanced search for search within, I'd like to see a multiple selection box as well (like what is used for the search in forums). That way just what they want to select is selected.

User avatar
naderman
Consultant
Posts: 1727
Joined: Sun Jan 11, 2004 2:11 am
Location: Berlin, Germany
Contact:

Re: Search System Discussion

Post by naderman »

Yeah it sounds like a good idea to abstract search from posts, maybe we can replace it with a generic search_id / type_id / type mapping to the index. So that type would be something like "post", "profile field", "message title", "pm", ..., and the index would simply search through the relevant search_ids, where type_id provides the reference to any other table liks posts, pms, etc.
EXreaction wrote:Also, I think when somebody does a search, it should cache all of the results (probably put in a limit of 500/1000 results or something) exactly as required for output and store it for 30 minutes (every time they search the same thing, like when using the next button it would update the time again). This should also generate an ID and use that for the url instead of trying to keep all of the words in the url from page to page (which has caused some issues).
That's pretty much what currently happens. However I think it's easier for users if they see the proper URL including the words instead of some strange id that they can't understand. That's also why IMHO all search pages should use GET, so users have easy access to the URL.

User avatar
EXreaction
Registered User
Posts: 1555
Joined: Sat Sep 10, 2005 2:15 am

Re: Search System Discussion

Post by EXreaction »

Having the search parameters in the url is certainly nice. The way I mentioned would have a few drawbacks, link being unable to link to searches, but I've had issues before with getting them through the get parameters correctly in the past (not with the base forum for a long time though).

I think the current caching system just caches the id's, right?

I'd rather see it go something like:
Pull entire list of id/types from DB
Send the info to the correct search "module" (PM/Post/Profile Fields/etc for permission checks/parsing)
Put the final output in an array
Cache the entire array
Output the current page

If it just caches the id's/types, it would help a bit, but I think the most intensive part of a search system like this will be checking for permission and parsing. So I think it would be better to cache it after that happens

Perhaps instead of having say a 500 limit it could just do this in batches of 50. Cache all of the id's first, then send the info to the correct modules for parsing/permission check and if there are less than 10 or whatever number you need to display send another batch to the correct modules and then cache whatever the result of that all is. That sounds like it would be the most efficient way to do it.

Of course the bad thing is it would be impossible to give an accurate estimate of how many items were found, but I am not sure that the number of found entries is important.

User avatar
naderman
Consultant
Posts: 1727
Joined: Sun Jan 11, 2004 2:11 am
Location: Berlin, Germany
Contact:

Re: Search System Discussion

Post by naderman »

Currently it caches ids, that is correct. It however only caches ids where the permissions are ok. Caching the processed content of whatever you searched for is very problematic, as lots of options can be changed (word censors, smilies, bbcode, etc.) between the cache miss and the next cache hit. The most resource intensive part of searching is the actual searching, processing the right ids is fairly fast.

User avatar
EXreaction
Registered User
Posts: 1555
Joined: Sat Sep 10, 2005 2:15 am

Re: Search System Discussion

Post by EXreaction »

Sounds good.

But to add to that a bit I do not think it would matter if something was changed after it cached everything once. After the first page it is likely to have results from days if not weeks ago for most boards. Even if something were changed after the results came back from the search I do not think that it would really matter (if it is caching only say 50 parsed items at a time). :)

User avatar
naderman
Consultant
Posts: 1727
Joined: Sun Jan 11, 2004 2:11 am
Location: Berlin, Germany
Contact:

Re: Search System Discussion

Post by naderman »

Err if I change something, I expect it to be changed. Also serving stale cache items is a no-go. We want to produce a user friendly application, and only in the second place a server friendly one ;-) Also you would need to generate stuff depending on a lot of things, like for each language and style a different version of the cache, which is way too much work.

User avatar
EXreaction
Registered User
Posts: 1555
Joined: Sat Sep 10, 2005 2:15 am

Re: Search System Discussion

Post by EXreaction »

Not that extensive of a cache.

Just the parsed text and everything prepared for outputting to the template.

It would only be a small section cached for the current search and current user which gets wiped after maybe 10 minutes.

code reader
Registered User
Posts: 653
Joined: Wed Sep 21, 2005 3:01 pm

Re: Search System Discussion

Post by code reader »

if i may add my 2c here.
i think that one of the things that should be changed is that there should be some linkage between the backend and the front end of the search.
some examples are:
  • the sphinx backend does not support "search in topic". however, the UI for "search in topic" is oblivious to the capabilities of the backend, and so, e.g., phpbb.com still has the dysfunctional "search this topic"
  • some backends (mysql fulltext) support exact phrase search, but the front-end is ill equipped to support it.
  • for some backends (again, mysql fulltext) have a hard time supporting all the options, but "have to" do it anyway, and hence are less efficient than they could have: for instance, in order to support both "search in text *and* subject" and "seach in text only", mysql fulltext has to index everything twice, which is very costly in space and performance (the big performance hit is when adding s new post, not when searching)
in short, i think it would be beneficial if the search capabilities and options that are offered to the user would be those that the backend supports best, and if other backends find it hard to support those options, but can easilly and efficiently supprot others, they should be allowed to.

this can be (relatively easily) supported by either defining an exhaustive array of options and allowing the backend to declare which of the total list is supported and which isn't, or by moving the piece that get the user input for the search to the realm of the "search plugins".

i know it somewhat off-topic, but imho, if any change to the search system is discussed, this is the first order of business.
it is somewhat discouraging that on phpbb oficial site one of the features ("search in topic") does not work. this is not how you build confidence in your product. it would be much better if the UI for the dysfunctional feature was removed also.

just my 2c,
have fun.

Post Reply