[RFC] Attachments in subfolders

Note: We are moving the topics of this forum and it will be deleted at some point

Publish your own request for comments/change or patches for the next version of phpBB. Discuss the contributions and proposals of others. Upcoming releases are 3.2/Rhea and 3.3.
User avatar
canonknipser
Registered User
Posts: 58
Joined: Mon Sep 19, 2011 4:42 am
Location: Germany

[RFC] Attachments in subfolders

Post by canonknipser » Mon Dec 19, 2011 1:59 pm

Based on this discussion: viewtopic.php?f=105&t=33498 started by user knot

current situation:
When using phpBBs builtin-function for uploading attachments, all attachments are currently stored in one subfolder, defaulted to "files", under the phpBB root directory.
Having a big number of attachments, there are quite a few limitations for board owners to deal with the attachments:
  • ftp server setting often allow only a certain number of files to operate with (very often defaulted to 2,000), without allowing the board owner to change this setting
  • backing up the board is difficult, also because of the random files names assigned by phpBB, so there is no "natural order" of files and identification of uploaded files since last backup needs a sorting by file-time when no synchronising/backup tool other than ftp can be used
  • modern file systems have no limitations on the number of files stored in one subfolder, but handling a huge number of files in one folder may have performance impacts on browsing, sorting etc.
RFC:
based on the mod described at http://www.phpbbchina.com/forum/viewtop ... =23&t=3337, i would suggest to add the following features to attachment management:
  • give board owners the ability to organise attachments in subfolders based on their requirements:
    • do nothing (current behaviour)
    • by attachment owner (user-id)
    • by upload date (using php date() function)
    • by attachment type (extention groups or file type)
    • by attachment id (eg. defining a maximum number of files per folder)
    • by file-name, eg. the first two characters of the generated "physical_file_name"
    The first two after "do nothing" are already implemented in the mod from phpbbchina.
  • automatic creation of subfolders when necessary using php-function mkdir()

Benefits:
  • access to attachments is not limited by server settings
  • backup of attachments or migration to a new server can be easier
possible problems: for solving this, phpBB can be shipped with a default folder structure like

Code: Select all

0
  - 0
  - 1
  - 2
  - 3
  - ...
  - D
  - E
  - F
1
  - 0
  - 1
  - 2
  - 3
  - ...
  - D
  - E
  - F
2
3
...
D
E
F
  - 0
  - 1
  - 2
  - 3
  - ...
  - D
  - E
  - F
and use the last option for using the subfolders

Location in acp:
General -> Board configuration -> Attachment setting
the different options can be created as a dropdown, opening additional input fields via javascript (eg. for date based folders a input field for the date string) when necessary.

Database changes
the column "physical_filename" in table "attachments" is currently defined as varchar(255), the filename is generated as a 32-character hash value prepended by numerical userid and a dash, so for longer pathnames (more than 200 characters) it is too short. We can solve this in 2 ways:
  • Change the field length
    • con: altering field length may not work on all database systems
  • Add an extra column ("path_name" as text)
Ticket: http://tracker.phpbb.com/browse/PHPBB3-10119
Greetings, Frank

english is not my native language

User avatar
A_Jelly_Doughnut
Registered User
Posts: 1777
Joined: Wed Jun 04, 2003 4:23 pm

Re: [RFC] Attachments in subfolders

Post by A_Jelly_Doughnut » Mon Dec 19, 2011 11:44 pm

canonknipser wrote: Database changes
the column "physical_filename" in table "attachments" is currently defined as varchar(255), the filename is generated as a 32-character hash value prepended by numerical userid and a dash, so for longer pathnames (more than 200 characters) it is too short. We can solve this in 2 ways:
For performance reasons, you're best off keeping the field as a varchar. Especially because I'm having trouble seeing a use case where strlen(directory name(s) + 32 char hash) > 255 given the filing possibilities below:
canonknipser wrote: * do nothing (current behaviour)
* by attachment owner (user-id)
* by upload date (using php date() function)
* by attachment type (extention groups or file type)
* by attachment id (eg. defining a maximum number of files per folder)
* by file-name, eg. the first two characters of the generated "physical_file_name"
Is the suggestion to implement all, one, or some of these?
A_Jelly_Doughnut

User avatar
canonknipser
Registered User
Posts: 58
Joined: Mon Sep 19, 2011 4:42 am
Location: Germany

Re: [RFC] Attachments in subfolders

Post by canonknipser » Tue Dec 20, 2011 7:51 am

A_Jelly_Doughnut wrote:For performance reasons, you're best off keeping the field as a varchar. Especially because I'm having trouble seeing a use case where strlen(directory name(s) + 32 char hash) > 255
Yes, your right, i just wanted to show a possible risk.
But, when looking at the way thumbnail file names are build, i would prefer to add a separate field for the pathname, so there would be no need to re-separate the pathname and filename for showing the thumb:
filename
106989_0123456789abcdef01234567890abcdef
corresponding thumbnail
thumb_106989_0123456789abcdef01234567890abcdef

And, having a field "path_name", its very simple to check which files need to be moved into subfolders during a migration phase (SELECT ... FROM phpbb_attachment WHERE PATHNAME = '')
A_Jelly_Doughnut wrote:Is the suggestion to implement all, one, or some of these?
I think we can implement all, because the functions for creating and using the path are all the same, only one basic function (building the pathname depending on config-values) will differ slightly
Greetings, Frank

english is not my native language

User avatar
naderman
Product Manager
Product Manager
Posts: 1727
Joined: Sun Jan 11, 2004 2:11 am
Location: Karlsruhe, Germany
Contact:

Re: [RFC] Attachments in subfolders

Post by naderman » Tue Dec 20, 2011 9:05 am

I think implementing that many different mechanisms is unecessary work, and having to configure this makes the ACP more complicated without any real benefit. I can see why people want subfolders, but a single mechanism that is always used should be enough.

User avatar
canonknipser
Registered User
Posts: 58
Joined: Mon Sep 19, 2011 4:42 am
Location: Germany

Re: [RFC] Attachments in subfolders

Post by canonknipser » Tue Dec 20, 2011 9:25 am

naderman wrote:but a single mechanism that is always used should be enough.
Which one would you prefer?
My personal opinion:
the folder should have a limited "lifetime", so after a defined point (date or number of files) no more files go in that folder. It makes it easier to have complete backups of a folder.
so, this will keep folders by
  • date
    date("Y/m/w") (with weekday subfolder) or
    date("Y/m") (without weekday subfolder)
  • attachment_id
    sprintf("%06d", floor($attachment_id, 100)) changing every 100 attachments
Greetings, Frank

english is not my native language

User avatar
bantu
3.0 Release Manager
3.0 Release Manager
Posts: 557
Joined: Thu Sep 07, 2006 11:22 am
Location: Karlsruhe, Germany
Contact:

Re: [RFC] Attachments in subfolders

Post by bantu » Fri Dec 23, 2011 11:07 pm

naderman wrote:I think implementing that many different mechanisms is unecessary work, and having to configure this makes the ACP more complicated without any real benefit. I can see why people want subfolders, but a single mechanism that is always used should be enough.
I agree. It's not like the path is used for direct downloading and thus "has to look pretty" like in Wordpress. Attachments have to always go through PHP for permission checking anyway and are served by download/file.php.
canonknipser wrote:the folder should have a limited "lifetime", so after a defined point (date or number of files) no more files go in that folder. It makes it easier to have complete backups of a folder.
I would guess this is just a problem because you are using ancient technology (i.e. probably FTP) for creating backups. For example rsync would just work fine and wouldn't care about it at all.

Edit: It also has to be checked whether there are mechanism relying on the user id being in the filename of attachments.

User avatar
canonknipser
Registered User
Posts: 58
Joined: Mon Sep 19, 2011 4:42 am
Location: Germany

Re: [RFC] Attachments in subfolders

Post by canonknipser » Sat Dec 24, 2011 8:10 am

bantu wrote:I would guess this is just a problem because you are using ancient technology (i.e. probably FTP) for creating backups. For example rsync would just work fine and wouldn't care about it at all.
I think its a problem for nearly all "normal" website owners residing on shared hosting: They normally do everything via "ancient" FTP, because their host providers don't offer them other methods for personal backup. And, as a fact, a lot of people having a board (or even a computer) don't know much about using tools which are more complicate than windows explorer and Drag'n'drop.
According to knowledge base Knowledge Base - Transferring Your Board to a New Host or Domain and a lot of articles on phpBB.com, FTP should be used to transfer all files to a new host. If you can't read complete folders via FTP, you can't transfer a complete board
In my eyes, having a human readable logic for building folder names is not only a kind of "pretty look", but helpful for board owner who want to know which folders they have to backup (or transfer) because they changed since last backup.
bantu wrote:Edit: It also has to be checked whether there are mechanism relying on the user id being in the filename of attachments.
Its written in the first post at the very end:
canonknipser wrote:filename is generated as a 32-character hash value prepended by numerical userid and a dash
But this is only for building filenames, all checking for ownership etc. is done via column "poster_id" in attachment table.
But, in this RFC, nothing is changed on filenames, only creating separate sub-folders following some logic.

I'm currently working on the code changes based on 3.0.10-RC2, i hope i can complete it until monday.
Greetings, Frank

english is not my native language

User avatar
bantu
3.0 Release Manager
3.0 Release Manager
Posts: 557
Joined: Thu Sep 07, 2006 11:22 am
Location: Karlsruhe, Germany
Contact:

Re: [RFC] Attachments in subfolders

Post by bantu » Sat Dec 24, 2011 2:04 pm

Looking forward to your implementation.

Oleg
Posts: 1150
Joined: Tue Feb 23, 2010 2:38 am
Contact:

Re: [RFC] Attachments in subfolders

Post by Oleg » Sat Dec 24, 2011 6:18 pm

A reasonably straightforward solution is to take the first two characters of the hash used in filename and use that as the subdirectory name. If more than one level of nesting is desired, keep pulling characters in increments of two off the filename and use that as subdirectories.

This has the advantage that given a filename hash, it is easily possible to figure out the path to the filename without prior knowledge of how many levels of subdirectories exist.

Of course this solution does not help with understanding what file is where. I like the approach of using subdirectories for dates, but in corner cases it may not work too well.

User avatar
canonknipser
Registered User
Posts: 58
Joined: Mon Sep 19, 2011 4:42 am
Location: Germany

Re: [RFC] Attachments in subfolders

Post by canonknipser » Sat Dec 24, 2011 7:09 pm

Oleg wrote:This has the advantage that given a filename hash, it is easily possible to figure out the path to the filename without prior knowledge of how many levels of subdirectories exist.
Oleg, thank you for your suggestion.
Following your syntax, we will have a flood of new folders on the first attachments added to the board with only one or two files in it.

As a default, i will build a function that converts the unique attachment id (which is mediumint) into a hex string (maximum "ffffff" for mediumint unsigned), and use every two bytes as a subdirectory name, so there is a maximum of two levels (the last two byte are stripped):
dec(1) -> hex(1) -> folder "00/00"
dec(100) -> hex(64) -> folder "00/00"
dec(255) -> hex(ff) -> folder "00/00"
dec(256) -> hex(100) -> folder "00/01"
dec(12345) -> hex(3039) -> folder "00/30"

So, every 256 attachments a new subfolder is built, and a maximum of 512 files (attachment + thumb) + .htaccess + index.htm are in a folder. This makes sure that:
  • normally no changes are needed in ftp-server configuration
  • no extra function is needed to count the files in a subfolder
  • board owners only need to download the contents of the newest folder(s) for frequently backups via ftp because every 256 attachments a folder is "complete"
I will build also the function "by date" in the two ways i described above, and use a config to let board administrators decide which folder syntax to use. If config is not present, the first folder syntax will be used by default.

We can ship phpBB with a set of subfolders for the first 10.00 files (or a similar number, like 00/00 to 00/30), so even on sites the php makedir function will not run, we can still use the subfolders.
Board owner with a larger number of attachments should be aware to have a "full function host".
The subfolder names will be stored in the database, and can be shown in the acp (but nowadays, we don't show even the physical file name in acp) or in database admin tool,
I will also prepare a script for converting from "no folder" to those mentioned above.
Greetings, Frank

english is not my native language

Post Reply