phpBB Session Backend Abstraction Proposal Draft

Discuss general development subjects that are not specific to a particular version like the versioning control system we use or other infrastructure.
Post Reply
erangamapa
Registered User
Posts: 13
Joined: Fri Feb 08, 2013 3:07 am

phpBB Session Backend Abstraction Proposal Draft

Post by erangamapa »

Need a feedback on this.


phpBB Session Backend Abstraction GSoC Project Proposal

Current Problem

Having a solid, extendible and responsive session interface for a Web Application is vital. Session system in phpBB is used to keep the track of phpBB system status. Two major problems can be identified related to phpBB session system. Firstly, session system does not have complete test coverage. Secondly It’s getting the use of database to handle volatile session data which will cause load problems at times.

Benefits

Broken functionality of session system can cause various issues. Therefore having a solid test suite which covers the phpBB session system completely will indicate any issues beforehand. Apart from that, If we use a caching system like memcache in combination with current session storage mechanism(database), we can reduce the database calls and optimize the system for demanding occasions. Since memcache is distributed memory object caching system, we can distribute the load on phpBB system by using a cluster of servers.

Major Tasks

For this implementation, three tasks can be identified.

-Separating storage from behaviour.

In current session codebase, behaviours and storage are coupled together. Database queries are written inside session related methods. A separate storage abstraction layer should be implemented. Then all the session data access queries should be moved to that layer. Needed session data will be accessed through this new layer. Finally session related methods will remain only with logic and will act as a session controller layer.

-Integrating memcache into session storage.

After session storage abstraction layer is implemented, we can get the use of memcache within the layer to improve the performance. Then, particular piece of session data will be cached as soon its fetched from the database. Afterwards If it is requested, It will be fetched from the cache instead of database. If its needed to be updated, both cached and persistent versions of it will get updated. For places like recording anonymous users where heavy writing happens, we can collect those data into memcache and later on write them to database at once.

-Writing tests.

For both above mentioned tasks, I will follow test-driven development(TDD). Initially tests will be written for each method in storage abstraction layer. Afterwards new methods will be implemented to pass those tests. There are many methods which doesn’t have any tests in session controller layer. In this case, I will be writing tests to already implemented methods and will consider object mocking for storage abstraction layer.

Current Work

For the task of Separating session storage from session behaviours, some work has already been carried out. Following are the references.

PR - https://github.com/phpbb/phpbb3/pull/1322

RFC discussion - viewtopic.php?f=84&t=33435

Tracker - http://tracker.phpbb.com/browse/PHPBB3-9733

I will be using this work as the starting point for my solution.

Memcache implementation is already exist in phpBB acm. Since acm is a boardwide configuration and we need memcache specifically for session, separate approach to configure memcache for session will be implemented. But code in this implementation can be reused.


Requested changes

There are many session related methods scattered in phpBB codebase. Those should be identified and refactored accordingly. For some of the methods, there are already written tests. Those test cases should be changed because object mocking should be used for storage abstraction layer that I am implementing. Storage abstraction implementations will contain both memcache access and database access appropriately. Following sections can be identified for changes.

-Main session methods

Main session related methods are implemented in session.php file. Some of the methods have storage access inside them. In the patch referred under current work, storage access is separated from behaviour for some of methods. Tests should be written for those methods and for main session methods. For rest, storage abstraction methods will be written by following TDD.

-User session methods

phpBB user is also a session related entity. Getting the use of memcache with user will also be beneficial. There are many user related properties such as user_id, user_ip and username that can be cached. The file user.php contains user related methods that needs to be refactored as mentioned in above changes.

-Viewing online users

For the purpose of viewing online users, session is used in phpBB. Methods for obtaining details about guest users are implemented in functions.php. All other methods that are used to view details of online users are implemented in viewonline.php. Most of these methods use data from phpBB sessions table. I will write tests and move database queries in those methods to storage abstraction layer. At some cases, I may be able to reuse storage access methods implemented under refactoring main session methods.

Memcache configuration

First of all memcache extension for php should be installed in server. This requirement will be checked when configuring memcached with phpBB session. Configuring memcache will be done in similar way that is used when configuring sphinx search server through ACP. If admin does not wish to use memcache with session, board configuration will indicate that and storage abstraction layer will switch to database only mode. If admin wants to configure, then he will generate configuration setting for his caching servers and configure them with phpBB.

Other

When considering above mentioned changes, separating storage from behavior in session should be done before writing tests for session methods. Afterwords storage abstraction methods should be used inside other session methods appropriately. When its over, tests should be written for those session methods by mocking storage abstraction layer. These steps can be followed concurrently for above mentioned different sections.

Main session methods and session related methods in user can be started early. Changes to them are essential. After that changes for methods under view online users can be started. If development takes longer than expected, changes for view online users methods can be skipped. If time permits, Redis implementation will also be integrated to storage abstraction layer.

asperous
Google Summer of Code Student
Posts: 21
Joined: Mon Apr 22, 2013 3:26 pm
Location: Tigard, Or
Contact:

Re: phpBB Session Backend Abstraction Proposal Draft

Post by asperous »

erangamapa,

Very interesting proposal! I'm a gsoc aspirant like yourself so I don't really have a say in your proposal, but I would say it's a fine proposal.
If we use a caching system
like memcache in combination with current session storage mechanism
(database), we can reduce the database calls and optimize the system for
demanding occasions.
I wonder if it would be better to keep the sessions in the database, and just cache on reads, or might it be better to do writes and reads on the cache? I imagine in production, cache servers rarely go down, but even when they do, the worst that would happen is everyone would get logged out. The advantage of having both reads and writes is, of course, simplicity and performance.

I noticed on some of my websites, php does a write on every page request to update the expire time. If you wanted to keep writes on the db you would have to keep that from happening to make it worth it but it's doable!

The other thing is that the codebase has an abstraction for multiple cache backends in includes/cache, it might be advantageous to use that to allow all kinds of backends to be used.

I totally agree with seperating logic from storage, that's very important imo.

Best wishes!

erangamapa
Registered User
Posts: 13
Joined: Fri Feb 08, 2013 3:07 am

Re: phpBB Session Backend Abstraction Proposal Draft

Post by erangamapa »

asperous,

First of all, thanks for considering my proposal and feedback.
Then, particular piece of session data will be cached as soon its fetched from the database. Afterwards If it is requested, It will be fetched from the cache instead of database. If its needed to be updated, both cached and persistent versions of it will get updated
I agree with you. You have to keep a persistent version of every piece of data related to session before caching them.
In case of cache server failure, data piece will survive.
For places like recording anonymous users where heavy writing happens, we can collect those data into memcache and later on write them to database at once.

This is where the problem comes with write back cache policy during a cache failure. In this case we are not going to cache large amount of data or collect data within a long time interval before write back. If you compare this time with mean time to failure of a cache server, it is very small. So data loss will be minimal during a cache failure. When considering the value of this anonymous users data, they are not that much critical when compared with other session data.

The other thing is that the codebase has an abstraction for multiple cache backends in includes/cache, it might be advantageous to use that to allow all kinds of backends to be used.
You are right. There is an existing cache interface in https://wiki.phpbb.com/Cache and we can use it. But Project Idea specifically mentions about memcache. I am thinking of doing it separately for session and giving the user a chance to use the distributed memory object caching feature of memcache.

Thanks!!

User avatar
naderman
Consultant
Posts: 1727
Joined: Sun Jan 11, 2004 2:11 am
Location: Berlin, Germany
Contact:

Re: phpBB Session Backend Abstraction Proposal Draft

Post by naderman »

I'm certainly in favour of having some storage backend which does persist sessions, so issues with the caching layer don't cause logouts. The main goal is to get away from having to write to the db on every request. So volatile information such as the last request time, could be kept in cache usually and updated only infrequently to reduce the number of writes.

asperous
Google Summer of Code Student
Posts: 21
Joined: Mon Apr 22, 2013 3:26 pm
Location: Tigard, Or
Contact:

Re: phpBB Session Backend Abstraction Proposal Draft

Post by asperous »

Naderman,

The safety-first approach makes sense, and I don't have the years of experience as a system-admin to backup my opinions. What is your opinion on offering the option of where writes & reads come from? It certainly wouldn't be too much more effort, and in my (admittedly small) experience, having sessions on a memory data structure is highly-efficient as well as fairly stable, as these servers shouldn't go down often in production.

("Memcached was designed specifically for sessions. It was originally the brainchild of the lead developer of livejournal.com, and later used to also cache the content of users' posts. The benefit was immediate: most of the action was taking place in memory. Page load times greatly improved."- http://stackoverflow.com/questions/1394 ... n-memcache)

It's also possible to compromise, by giving phpBB admins the option of expire-only, or entire sessions on a memory store. With that it would be important to give admins the ability to migrate the data, in the case of planned downtime. It's also possible to make the module persist the data after a certain amount of time (so only a small percentage of recently-logged in users would be affected by a crash). It's also might not be as bad as it seems, memcache servers can be clustered, redis instances can be configured to perform periodic persistence for reliability.

The biggest issue I think of having a server crash and logging getting out is someone might be writing a draft and it wouldn't get saved/posted because they were logged out as they were writing (a problem should be fixed in its own right by storing the draft in a cookie as the users types). As well as, of course, the inconvenience of being suddenly logged out would be pretty annoying to the users as well as make the site appear unprofessional.

Yet, even still I understand the wisdom of safety-first school of designing a system that avoids opportunities for optimization in order to prevent potential interruptions.

Do you think that safety-first should be forced, or do you think there could be value in an option?

Oleg
Posts: 1150
Joined: Tue Feb 23, 2010 2:38 am
Contact:

Re: phpBB Session Backend Abstraction Proposal Draft

Post by Oleg »

Please review https://github.com/phpbb/phpbb3/pull/934 for what was attempted (and given up on).

User avatar
EXreaction
Registered User
Posts: 1555
Joined: Sat Sep 10, 2005 2:15 am

Re: phpBB Session Backend Abstraction Proposal Draft

Post by EXreaction »

Having the backend abstracted would be very nice. I would consider making a backend for PHP sessions as well if you have some extra time at the end. A lot of web systems and frameworks use PHP sessions and support for that would mean someone should be able to make sessions work across multiple systems much more easily.

erangamapa
Registered User
Posts: 13
Joined: Fri Feb 08, 2013 3:07 am

Re: phpBB Session Backend Abstraction Proposal Draft

Post by erangamapa »

That would be a better idea. If I have time I will consider abstracting PHP session as well. If we are going to implement session access across multiple systems, need to pay attention on security(session hijack).

On the other hand Back-end for PHP sessions also exist in Symfony under HttpFoundation. Going forward phpBB is using Symfony, maybe we can use that session back-end.

User avatar
EXreaction
Registered User
Posts: 1555
Joined: Sat Sep 10, 2005 2:15 am

Re: phpBB Session Backend Abstraction Proposal Draft

Post by EXreaction »

What do you mean by:
If we are going to implement session access across multiple systems, need to pay attention on security(session hijack).
Using an already existing backend might be a good thing to look into.

Post Reply