[GSoC] Search Backend Refactoring

Note: We are moving the topics of this forum and it will be deleted at some point

Publish your own request for comments/change or patches for the next version of phpBB. Discuss the contributions and proposals of others. Upcoming releases are 3.2/Rhea and 3.3.
dhruv.goel92
Registered User
Posts: 22
Joined: Sun Mar 18, 2012 9:30 pm

[GSoC] Search Backend Refactoring

Post by dhruv.goel92 »

Motivation for Proposal/ Goal
Improving and Extending the PHPBB Search Backend will provide faster and more efficient search capability. It will also make the interface more user friendly by introducing filters. Open Source Search backends like- Sphinx, Lucene are very important for forums which have huge databases. I chose Sphinx because of its easy integration with SQL and PHP. If time permits (or perhaps after GSOC) I would like to get started with integrating some other search backend (Lucene or Solr) too.

What I have
Nils Adermann (IRC: naderman) has already made a pre alpha implementation of Sphinx Search Backend [http://github.com/naderman/sphinx-for-phpbb]. A user (wagnerch) has submitted a patch for PostgreSQL Fulltext search, some of its code can be used [viewtopic.php?f=4&t=28707&hilit=postgresql].
Oleg has also been working on a branch for PostGreSQL FUlltext search [https://github.com/p/phpbb3/tree/featur ... ext-search]
The PG Search will be targetting PG version 8.3+

What I need to do
The abstraction of current search backends can be improved so as to decrease code duplications and effective implementations. Ability of the backend to modify the search interface would work great as interface can be modified according to the search backend being used. Filters can be introduced to narrow down the search results. Extending the current search backend to index tables other than posts, forum and topics table to MODS/ext which will be useful in the Moderator Control Panel and introducing filters which will make searching a better experience. The implementation of Sphinx search will need to be refined and modified according to the new abstraction scheme. PostgreSQL Fulltext search class will have to be written from scratch, small snippets of code and some idea can be inherited from the submitted patch.

Benefits to the Community
The search with better abstraction will certainly make implementation of new search backends easier and the search feature as a whole faster and efficient. PostgreSQL Fulltext and Sphinx search will give forum admins an added functionality for a search backend. Sphinx in particular is a very powerful open source search system having many advantages over the native MySQL Search. The user interface will include different filters so as to make searching flexible and easier to use. In addition the current search implementations indexes only the posts, forums and topics table, an added extension to index MOD/extensions and Private Messages tables will certainly for a better PHPBB user experience.

Potential Problems
Some of the current search backend methods have too many parameters, may need to be refined completely during abstraction.
Each search backend may need to be broken into multiple classes for proper implementation.
Adding/ Removing search interface controls through the backend may prove to be difficult to implement in view of the current code.
Initial Thoughts/ Implementations
I would first like to make a prototype of the class structure in particular, implementing proper abstraction. This will be done in such a way that integrating new search backends becomes fairly easy. Various functions of the current backends will be made to inherit (phpbb_search_base class already does this to an extend but this can be improved further). The Sphinx and PostgreSQL Fulltext alpha implementations would serve as a reference to writing the new classes. Many of the methods could be inspired, changing their parameters will be required though.

Search backend, able to modify the search user interface, will require new method definitions which will need to be made flexible and at the same time secure. I would try to make a rough model of all these classes, methods and the basic search structure, and like to discuss this with my mentor and community before the actual coding begins.

Near Term Goals
Install, Configure and get Sphinx search and PostGreSQL search implementations to work. Keep a rough draft of the steps to make documentations easier. I will be contributing a wiki page to what configuration parameters these 2 search backend have and how to configure them.

Oleg
Posts: 1150
Joined: Tue Feb 23, 2010 2:38 am
Contact:

Re: [GSoC] Search Backend Refactoring

Post by Oleg »

This is an accepted GSoC project that I mentor.

Current todo list:

1. Apply sphinx plugin code to phpbb tree, create a branch on github with results.

Important: the first commit of this branch should have the exact same code as what is in sphinx repository. Subsequent commits can make any modifications needed.

2. Write user documentation for using sphinx search.

a. What prerequisites are required? Sphinx - what versions are supported/known to work? Anything besides sphinx?
b. How can sphinx be configured? Is specific configuration required? If no, link to sphinx configuration docs.
c. If any sphinx configuration *must* be modified, we should include a sample configuration file, like we do for e.g. nginx/lighttpd.
d. What configuration is required on phpbb's side to use sphinx?

3. Update postgresql fulltext search patch for current develop. Get postgresql fulltext search working on a test board.

4. Write user documentation for postgresql text search. Same bullet points as #2 above.

---

With this done we should have an idea of what changes are necessary to phpbb to make sphinx and postgresql search backends properly usable. By "properly usable" I mean that any configuration needed should be doable via acp.

Then the existing code for sphinx/postgresql should be modified to fit phpbb's coding guidelines, if necessary. Any changes between 3.0 and 3.1 should also be applied to the respective search plugins.

In case of sphinx it uses a third-party sphinx-php interface it would appear. The source of this interface needs to be determined, appropriate attribution added to AUTHORS file and if there is a newer version of it available we need to consider whether we want to upgrade.

For postgres we probably want to use 8.3+ fulltext search as discussed elsewhere, which requires bumping requirements and checking for this version either in installer or when enabling fulltext search. Maybe in installer as we want to require a modern version of postgres to take advantage of functionality allowing us to improve performance of some queries - there are tickets for that.

Afterwards we can work on test suites and refactoring to make the code better.

User avatar
naderman
Consultant
Posts: 1727
Joined: Sun Jan 11, 2004 2:11 am
Location: Berlin, Germany
Contact:

Re: [GSoC] Search Backend Refactoring

Post by naderman »

I have marked my original [RFC] Search Backend Refactoring as superseded by this one.

Oleg
Posts: 1150
Joined: Tue Feb 23, 2010 2:38 am
Contact:

Re: [GSoC] Search Backend Refactoring

Post by Oleg »

A somewhat related issue is http://tracker.phpbb.com/browse/PHPBB3-9551. Mysql fulltext backend does not always change all column types/collations together resulting in broken boards.

dhruv.goel92
Registered User
Posts: 22
Joined: Sun Mar 18, 2012 9:30 pm

Re: [GSoC] Search Backend Refactoring

Post by dhruv.goel92 »

PostgreSQL Branch working for new install as well as on updating to 3.1 [https://github.com/dhruvgoel92/phpbb3/t ... ext-search]
getting Sphinx Search is still on , once done we will have an idea how to proceed further as oleg said.

Oleg
Posts: 1150
Joined: Tue Feb 23, 2010 2:38 am
Contact:

Re: [GSoC] Search Backend Refactoring

Post by Oleg »

You are missing ticket numbers in commit messages. http://wiki.phpbb.com/Git#Commit_Messages

Please create a pull requests such that comments can be properly tracked.

dhruv.goel92
Registered User
Posts: 22
Joined: Sun Mar 18, 2012 9:30 pm

Re: [GSoC] Search Backend Refactoring

Post by dhruv.goel92 »

Yes i have included the ticket number in the commit messages.

Oleg
Posts: 1150
Joined: Tue Feb 23, 2010 2:38 am
Contact:

Re: [GSoC] Search Backend Refactoring

Post by Oleg »

1. Version detection does not work for 9.0+ (comparison is botched).
1a. Since pre-8.3 is no longer supported the language for when
"tsearch2 is not found" needs to be changed, as if tsearch2 is present
it won't be used regardless.
This should probably simply become a (correct) version check.
2. Even though we are not going to support pre-8.3 postgreses,
we need to correctly handle the situation of people using a
pre-8.3 postgres on 3.1 due to upgrades.
The code should produce appropriate error messages.
3. When searching, leading + is added to the query - the
query in the search box and also displayed query in the main block.
E.g. http://localqi/boards/t231/search.php?keywords=culpa
4. When no matches are found, search box is cleared (-culpa +sunt).
5. Add documentation that + and - are supported in postgres?
6. -sunt alone finds sunt.
7. Words are broken on dashes:
a-culpa, aa-culpa are equivalent to culpa; aaa-culpa finds nothing.
8. Added postgres readme no longer serves any function and should be deleted.

Pull request should be created for comments.

What seems to happen is the code takes the search query, massages it and puts the massaged version into the UI. What should happen is whatever user entered should be kept around and placed in the UI.

dhruv.goel92
Registered User
Posts: 22
Joined: Sun Mar 18, 2012 9:30 pm

Re: [GSoC] Search Backend Refactoring

Post by dhruv.goel92 »

Documentation for + and - support is present in advanced search much like in mysql fulltext.
Incase of no match currently use trigger_error because of which search box is cleared. Should i modify this to actually show the search term?
Rest things mentioned above have already been resolved or are i am working on them.
I also intend to get sphinx working for your reviews how to further improve it and fix bugs once it is in a working state.

dhruv.goel92
Registered User
Posts: 22
Joined: Sun Mar 18, 2012 9:30 pm

Re: [GSoC] Search Backend Refactoring

Post by dhruv.goel92 »

When Autoconf starts the sphinx search server [http://sphinxsearch.com/forum/view.html?id=2967] this issue is encountered. A possible solution is tell PHP not to wait for the output from exec( ) or maybe create a new process in the background. Or can we just skip the autconf?
Sphinx Documentation does mention that exec ( ) should not be used to start sphinx server.

Post Reply