Future support for Sphinx Search

General discussion of development ideas and the approaches taken in the 3.x branch of phpBB. The next feature release of phpBB 3 will be 3.3/Proteus.
Forum rules
Please do not post support questions regarding installing, updating, or upgrading phpBB 3.2.x. If you need support for phpBB 3.2.x please visit the 3.2.x Support Forum on phpbb.com.

If you have questions regarding writing extensions please post in Extension Writers Discussion to receive proper guidance from our staff and community.
GarageChemist
Registered User
Posts: 2
Joined: Tue Oct 30, 2018 3:39 am

Future support for Sphinx Search

Post by GarageChemist »

It sounds like from the Sphinx release notes that the author plans to phase out the API he formerly used, in favor of using an SQL-like query syntax he calls "SphinxQL". There seem to be a number of new features in the works too, that he only plans to support via the SphinxQL interface. I can't say I disagree with that decision. This interface allows for a much easier means of interacting with the search application, as it can emulate a MySQL database server and thus accept queries from the mysql client. That means that the PHP MySQL API can do this too.

The current Sphinx integration is definitely showing its age, and I don't think it will actually work with the recent versions without making a few changes to it. It's not very easy to tell where the code for generating the Sphinx config file stops and the code for actually running the search begins. It would help if the code was divided into two different files, since once the config file is set up, the code for generating it just gets in the way. But if the MySQL API can be used for making queries, then any new code would probably be minimal.

Even though it works well enough, the automatically-generated source and index identifiers are needlessly long and inconvenient to type in during testing. I would suggest using "source_{PREFIX}_main" or "index_{PREFIX}_delta" initially, then maybe try again with a number on the end if that name is taken. Just because Sphinx looks like it's going to be adding a bunch of useful features in the future, and testing them from the command line requires entering that identifier regularly.

Anyway, here's the more recent documentation for Sphinx:

http://sphinxsearch.com/docs/sphinx3.html

At the risk of sounding like I'm requesting support, I've found that when querying Sphinx in phpBB, the results are returned only slowly, and they're different from the ones I get when making the query on the backend. The backend query results that I enter manually in a console are great, and return instantly. Both types of queries show up in Sphinx's logs, so I'm not sure what to make of it. The only major difference seems to be the protocol that's being used, unless somehow my old MySQL fulltext search index is being used when it shouldn't be. But if this is a bug, maybe this information is relevant.

User avatar
ThE KuKa
Style Customisations
Style Customisations
Posts: 24
Joined: Sun Dec 14, 2003 2:59 pm
Location: Barcelona - Spain
Contact:

Re: Future support for Sphinx Search

Post by ThE KuKa »

I thought I was already incorporated...
https://www.phpbb.com/about/features/#sphinxfulltext
All unsolicited PMs will be ignored.
:warning: Knowledge Base | Documentation | Board rules :warning:


If you like my styles, translations, etc. and want to show some appreciation, then feel free to Donate with Image
:flag_es: phpBB Spain - Online Since 2003 :heart:

GarageChemist
Registered User
Posts: 2
Joined: Tue Oct 30, 2018 3:39 am

Re: Future support for Sphinx Search

Post by GarageChemist »

Yes, it was incorporated, but there are two ways to query the Sphinx search engine: via the original Sphinx API protocol, and via an SQL-like querying system called "SphinxQL". The former is being phased out in favor of the latter. The current implementation appears to derive from a phpBB mod that was written 10 years ago, and needless to say, a lot has changed since then.

Looking at the trajectories that phpBB and Sphinx have followed in their development, it would seem like if there's any update to the phpBB Sphinx implementation, it ought to make use of the "SphinxQL" system of querying, rather than the older API that's used in the current implementation.

User avatar
Ger
Registered User
Posts: 293
Joined: Mon Jul 26, 2010 1:55 pm
Location: 192.168.1.100
Contact:

Re: Future support for Sphinx Search

Post by Ger »

Wouldn't that also depend on the version of Sphinx installed on the server? E.g. wouldn't phpBB need to support both "old and new" Spinx?
Above message may contain errors in grammar, spelling or wrongly chosen words. This is because I'm not a native speaker. My apologies in advance.

User avatar
JoshyPHP
Registered User
Posts: 368
Joined: Fri Jul 08, 2011 9:43 pm

Re: Future support for Sphinx Search

Post by JoshyPHP »

SphinxQL has been around since 0.9.9-rc2 so it should cover any version of Sphinx released this decade.

User avatar
Ger
Registered User
Posts: 293
Joined: Mon Jul 26, 2010 1:55 pm
Location: 192.168.1.100
Contact:

Re: Future support for Sphinx Search

Post by Ger »

OK, so just a simple mention in the release notes would cover that.
Above message may contain errors in grammar, spelling or wrongly chosen words. This is because I'm not a native speaker. My apologies in advance.

Hunchman801
Registered User
Posts: 15
Joined: Fri Sep 11, 2015 12:55 pm

Re: Future support for Sphinx Search

Post by Hunchman801 »

By the way, is there a reason EscapeString is used on all Sphinx queries? It would be quite handy to leverage the features of the extended query syntax.

KYPREO
Registered User
Posts: 6
Joined: Wed Dec 11, 2019 12:29 am

Re: Future support for Sphinx Search

Post by KYPREO »

I have been working away at fixing a number of issues on the implementation for Sphinx Search and to specifically resolve the following tickets:

PHPBB3-16234 Search syntax broken when using Sphinx Fulltext backend
PHPBB3-16233 Enable exact phrase searching with Sphinx Fulltext
PHPBB3-13958 search phrase interprets operator words

I have posted some thoughts on the main phpBB community board, but I think discussions belongs better here. I was previously unable to do so as I could not register on this board, but thanks to the administrators that is now fixed.
Hunchman801 wrote:
Wed Nov 14, 2018 5:29 pm
By the way, is there a reason EscapeString is used on all Sphinx queries? It would be quite handy to leverage the features of the extended query syntax.
The reason for EscapeString is that if you enter a search query with special characters reserved for Sphinx extended query syntax but the characters are used incorrectly, then it will throw an error. The EscapeString function was used to address this ticket: PHPBB3-15367 Sphinx search backend doesn't escape special characters

It is crude way to fix the problem and creates most of the problems referred to above and I have set about fixing it properly, as well as better leveraging the power of the Sphinx backend. Sphinx extended query syntax was actually already in the phpBB code, but the introduction of the EscapeString function in phpBB 3.2.2 broke it.

I have now made the following changes to the implementation of the Sphinx backend:
  • Phrase searching has been restored
  • Extended query syntax for -, |, ( and ) has also been restored
  • Verbal operators (NOT, OR) are only converted into Boolean operators when not appearing within quotation marks (addressing PHPBB3-13958)
  • Certain special extended query syntax operators are only escaped when used incorrectly. This has extended the functionality of the Sphinx back end, to allow for the following:
    * the proximity search operator ~: eg "hello world"~10 - Finds the words hello and world within 10 words of each other
    * quorum matching operator /: eg "the world is a wonderful place"/3 - Finds 3 words in the phrase list, eg it will return a post with "the world is" or "a wonderful place" etc.
    * exact form match =: eg ="search this exact phrase". This will find the exact phrase. Exact form match is only necessary when using morphology in the Sphinx engine. The default sphinx.conf for phpBB has morphology disabled. I have indexed my board using a lemmatizer so that the search will return all grammatical forms with the same stem as the query keyword. This means that a search for "search this phrase" will also return "searched this phrase", "searching this phrase". A search for run will find "runs", "ran", "running" etc. The = operator ensures an exact match only and can be used for single words or whole phrases. This really elevates phpBB searching to a completely new level and more on par with what you would expect from a search engine.
  • hyphenated words are now properly parsed and turned into search queries Sphinx can process. Currently, phpBB turns hyphens into minuses (ie NOT search operator) and therefore processes know-it-all as know -it -all (ie know, but not it and not all). With my code fix, know-it-all now becomes ("know it all")|knowitall) so it will look for all instances of the hyphenated string as both a phrase with spaces and as a single word.
The other aspect of functionality currently missing is wildcard searching. Sphinx can do this too. The current phpBB implementation actually supports wildcard searching using Sphinx already, but the search index needs to be built with the correct configuration. The standard phpBB sphinx.conf has the necessary wildcard features turned off by default. I have amended the phpBB Development Wiki to explain what needs to be done to enable wildcard searching: https://wiki.phpbb.com/Sphinx_Fulltext_ ... _searching

The only matter remaining on my to do list is how to deal with British vs American spelling variations. On my board, the words "equaliser" and "equalizer" are used interchangeably, but refer to the same thing. A good search tool on an international discussion board should search treat British and American spelling variations equally. Sphinx has the ability to use a custom wordform list which might address this. This is not a phpBB issue, however, but rather a Sphinx configuration issue. If I find a solution I think the best option is to add it to the Development Wiki.

If anyone wants to test my new upgraded implementation of the Sphinx engine, I have now deployed it on my live board: www.ausrotary.com/search.php

My search page has an updated explanation on search keyword. I intend on developing this further to explain use of the ~ and / operators.

You are welcome to try out the new search operators described above. I look forward to any feedback before I submit my PR on Github.

I am particularly keen on input on the hyphenation feature and whether this should be standard. I am wondering whether it is overkill and users would be confused by it. I have a specific use case on my board, as it deals with cars that use optional hyphens in the model name eg Mazda RX-3 should find Mazda RX3 and vice versa. I could deal with this through a custom wordform list.

Hunchman801
Registered User
Posts: 15
Joined: Fri Sep 11, 2015 12:55 pm

Re: Future support for Sphinx Search

Post by Hunchman801 »

Sounds great, I can't wait to test your implementation. Please let us know when you've released the code.

On the subject on Sphinx, there's another bug (can't view the last results of a query when there's more than 20,000 of them) that has a very simple fix, hope this gets included too one day.

User avatar
david63
Registered User
Posts: 299
Joined: Mon Feb 07, 2005 7:23 am
Location: Lancashire, UK

Re: Future support for Sphinx Search

Post by david63 »

Hunchman801 wrote:
Mon Dec 23, 2019 2:34 pm
On the subject on Sphinx, there's another bug (can't view the last results of a query when there's more than 20,000 of them) that has a very simple fix, hope this gets included too one day.
Have you created a PR for the fix?
David
Remember: You only know what you know -
and you do not know what you do not know!

Post Reply