Motive:
Forums’ current tools to find information are rather poor. Despite the active academic works on post and thread search, forums’ searching tools rely on the back end database full text search engine; hence, the structure and nature of forums are ignored completely. In this RFC, I wish we could implement the state-of-the-art methods of two search tasks: post search and thread search. The necessity of this project can be projected from the number of duplicated contents in forums and forums emphasis on searching before asking.
I submitted this feature as a project idea for Google summer of code 2012, but it was not selected. Nevertheless, naderman suggested to post the idea as RFC, and here it is.
The Google summer of code idea of search backend refactoring proposed by phpBB is complementary to this request. This request deals with implementing the algorithms, while the GSOC aims to improve code extension and efficiency.
My aim is not only to implement this function but also to implement other searching tasks such as expert finding[7] and thread recommendation[8]. However, for time being, we should focus on thread and post search tasks.
For thread search, I am considering the following methods : [1]'s pseudo cluster selection model, [2] 's weighted product method and [3]'s Title+initial post+reply post representation.
In post search, we can use the best performing method found in [4].
Note that much of the work above needs some academic background on Information retrieval and search engines. However, most of the stuff needed to implement the state of the art are available within any forums database.
Please feel free to ask and discuss; and pardon my academic writing style.
References
1.Elsas, J.L., Ancestry.com Online Forum Test Collection, in Technical Report CMU-LTI-017. 2011, Lan-guage Technologies Institute, School of Computer Science, Carnegie Mellon University.
2.Seo, J., W. Bruce Croft, and D. Smith, Online community search using conversational structures. Information Retrieval, 2011: p. 1-25.
3. Bhatia, S. and P. Mitra. Adopting Inference Networks for Online Thread Retrieval. in Proceedings of the Twenty-Fourth AAAI Conference on Artificial Intelligence. 2010. Atlanta, Georgia, USA, .
4.Duan, H. and C. Zhai, Exploiting Thread Structure to Improve Smoothing of Language Models for Forum Post Retrieval, in To appear in the 33rd European Conference on Information Retrieval (ECIR 2011). 2011: Dublin, Ireland.
5.Wang, H., et al. Learning Online Discussion Structures by Conditional Random Fields. in The 34th Annual International ACM SIGIR Conference (SIGIR'2011). 2011.
6.Xi, W., J. Lind, and E. Brill, Learning effective ranking functions for newsgroup search, in Proceedings of the 27th annual international ACM SIGIR conference on Research and development in information retrieval. 2004, ACM: Sheffield, United Kingdom. p. 394-401.
7. Seo, J. and W.B. Croft, Thread-based Expert Finding, in SIGIR’09 SSM Workshop. 2009, ACM: Boston, Massachusetts.
8.Zhao, J., et al., Learning a user-thread alignment manifold for thread recommendation in online forum, in Proceedings of the 19th ACM international conference on Information and knowledge management. 2010, ACM: Toronto, ON, Canada. p. 559-568.



