Optimisation of download/file.php

General discussion of development ideas and the approaches taken in the 3.x branch of phpBB. The current feature release of phpBB 3 is 3.3/Proteus.
Forum rules
Please do not post support questions regarding installing, updating, or upgrading phpBB 3.3.x. If you need support for phpBB 3.3.x please visit the 3.3.x Support Forum on phpbb.com.

If you have questions regarding writing extensions please post in Extension Writers Discussion to receive proper guidance from our staff and community.
Post Reply
TerryE
Registered User
Posts: 95
Joined: Sat May 23, 2009 12:24 am
Contact:

Optimisation of download/file.php

Post by TerryE »

This was a point discussed on another topic: Improving phpBB performance for small installations, but it's really a separate issue which merits forking into its own topic.
TerryE wrote:Whenever possible, the [phpBB] application should correctly negotiate the revalidation protocols for quasi-static content. The two relevant modules here are download/file.php and style.php, and these modules don't do this (except for avatars).
naderman wrote:That all makes perfect sense to me. As for improving file.php. I think the hurdle to pay attention to there is attachment permissions: making sure that content is not served to users who do not have the appropriate permissions.
From a performance perspective, the three factors which dominate the runtime overhead of download/file.php are
  1. The browser should locally caches any images, to avoid repeated download requests to the server. To enable this, the module should correctly set the "Expires" and "Cache-Control" directives.
  2. When the user "refreshes" the image (or set the option to revalidate on a per session basis, the browser will enter into a revalidation negotiation if the previous download defined "Etag" or "Last-Modified" response headers. When appropriate, the module should process any corresponding HTTP_IF_NONE_MATCH or HTTP_IF_MODIFIED_SINCE request headers and return a 304 status to avoid the overhead of a nugatory download.
  3. The D/B operations needed to process the download request. In the current implementation these are unnecessarily heavy.
At the moment there also seem to be (undocumented) security requirements which constrain the download function:
  1. The true pathname to the actual file location must be private to the application and not exposed in HTML content.
  2. The file should only be visible to the user if the user has read access to the forum which contains the post which contains the file.
  3. There is some other counter-measure code in the module, but the commenting is narrative in style, rather than a simple explanation of the requirement and I'll need to go through it in detail to understand further.
To give you an idea of what I mean by the D/B ops being "unnecessarily heavy", here is a mysql log of those for a single single image embedded in a topic:

Code: Select all

Connect terry@localhost on wf
Query   SET NAMES 'utf8'
Query   SELECT @@session.sql_mode AS sql_mode
Query   SET SESSION sql_mode=',STRICT_ALL_TABLES,STRICT_TRANS_TABLES'
Query   SELECT config_name, config_value FROM forum_config WHERE is_dynamic = 1
Query   SELECT u.*, s.* FROM forum_sessions s, forum_users u
        WHERE s.session_id = 'ec5907a9c111539c89fc78e01fb86791' AND u.user_id = s.session_user_id
Query   UPDATE forum_sessions SET session_time = 1302517632 WHERE session_id = 'ec5907a9c111539c89fc78e01fb86791'
Query   SELECT s.style_id, t.template_storedb, t.template_path, t.template_id, t.bbcode_bitfield, 
                t.template_inherits_id, t.template_inherit_path, c.theme_path, c.theme_name, c.theme_storedb,
                c.theme_id, i.imageset_path, i.imageset_id, i.imageset_name
        FROM forum_styles s, forum_styles_template t, forum_styles_theme c, forum_styles_imageset i
        WHERE s.style_id = 1 AND t.template_id = s.template_id
        AND c.theme_id = s.theme_id AND i.imageset_id = s.imageset_id
Query   SELECT * FROM forum_styles_imageset_data
        WHERE imageset_id = 1 AND image_filename <> '' AND image_lang IN ('en', '')
Query   SELECT attach_id, in_message, post_msg_id, extension, is_orphan, poster_id, filetime
        FROM forum_attachments WHERE attach_id = 18 LIMIT 1
Query   SELECT p.forum_id, f.forum_password, f.parent_id
        FROM forum_posts p, forum_forums f WHERE p.post_id = 606 AND p.forum_id = f.forum_id LIMIT 1
Query   SELECT e.extension, g.* FROM forum_extensions e, forum_extension_groups g
        WHERE e.group_id = g.group_id AND (g.allow_group = 1 OR g.allow_in_pm = 1)
Query   SELECT attach_id, is_orphan, in_message, post_msg_id, extension, physical_filename, real_filename, mimetype, filetime
        FROM forum_attachments WHERE attach_id = 18 LIMIT 1
Quit
Eight selects from half a dozen tables and one update to the session table. Give that probably >99% of download/file requests are to display inline images in viewtopic in guest accessible forums (that's my experience on OOo), I recommend that
  1. We add the "Etag" or "Last-Modified"negotiation for all downloads
  2. We don't update the session table for an image view. (I am not sure why this is happening as $user->session_begin(false) isn't supposed to update this table anyway.
  3. We add (just as we've done with avatars) an extra optimised path for images viewed from a guest accessible forum. I think that this would require only one true SQL statement (the others are ACM cached), or at least review session to work out why a begin(false) is doing updates.
  4. We recomment the existing code to explain what is going on from a security perspective.
Thoughts and comments?

User avatar
naderman
Consultant
Posts: 1727
Joined: Sun Jan 11, 2004 2:11 am
Location: Karlsruhe, Germany
Contact:

Re: Optimisation of download/file.php

Post by naderman »

Sounds good to me. I'm sure Henry (kellanved) can help you out if any of the security related stuff is unclear. He also wrote an article on the mime sniffing problem for a german IT news site a while ago (english version http://www.h-online.com/security/featur ... 46229.html)

Post Reply