[RFC] Post Revisions

Note: We are moving the topics of this forum and it will be deleted at some point

Publish your own request for comments/change or patches for the next version of phpBB. Discuss the contributions and proposals of others. Upcoming releases are 3.2/Rhea and 3.3.
Post Reply
User avatar
imkingdavid
Registered User
Posts: 1050
Joined: Thu Jul 30, 2009 12:06 pm

[RFC] Post Revisions

Post by imkingdavid » Mon Apr 23, 2012 7:57 pm

Ticket

This RFC is for implementation of a post revisions system into phpBB 3.x. The feature was originally discussed in an RFC for 4.0 here, but there is no reason it has to wait for 4.0 for inclusion. So I decided I would choose it for my GSOC propsal, which can be viewed in full here. That details my thoughts on what needs to be done, what potential issues could arise, and how I plan to deal with them.

Here is some more stuff that was discussed (today) in IRC (for those with the actual log, I removed my questions that got answered <1 sec before I asked them :P):

Code: Select all

<naderman> why not open this up to other users too?
<naderman> I mean I don't think we should limit the functionality to moderators
<naderman> but allow users to pick "any user can edit my post"
<naderman> which then ends up being a sort of wiki post
<imkingdavid> interesting
<naderman> it's useful rather commonly for first posts in a topic
<bantu> yeah, you should probably take such usecases into account as well
<naderman> and if we add the revisions anyway
<naderman> we might as well use it for that too
<naderman> e.g. I'd do revisions for everything for moderation
<bantu> you can get most of it for free, which is great :-)
<naderman> and then add a permission to wikify posts voluntarily as a user
<naderman> functionality is pretty much the same
<naderman> only access is a bit different
<naderman> imkingdavid: what do you think?
<imkingdavid> alright, I'll consider how to implement that. sounds like a good idea
So the aim is to extend what I already proposed to allow users to make posts editable by other users, just like a wiki page (although there will probably be another permission to disallow some problem users from editing "wiki" posts).

EDIT: It was also pointed out to me that phpBB has its own diff engine that I can use instead of the one I mentioned in my GSOC proposal.

EDIT 2: A user on phpBB.com (drathbun) posted a comment on this feature in the GSOC accepted proposals discussion with a few suggestions as well.

Thoughts/Comments/Suggestions?
I do custom MODs. PM for a quote!
View My: MODs | Portfolio
Please do NOT contact for support via PM or email.
Remember, the enemy's gate is down.

User avatar
callumacrae
Infrastructure Team
Infrastructure Team
Posts: 1046
Joined: Tue Apr 27, 2010 9:37 am
Location: England
Contact:

Re: [RFC] Post Revisions

Post by callumacrae » Mon Apr 23, 2012 8:35 pm

+1!

There's a mod that does this that I seem to remember being fairly good, it might be worth looking at: http://www.phpbb.com/customise/db/mod/p ... revisions/
Made by developers, for developers!
My blog

User avatar
drathbun
Registered User
Posts: 72
Joined: Wed Feb 15, 2006 6:40 pm
Location: Texas
Contact:

Re: [RFC] Post Revisions

Post by drathbun » Mon Apr 23, 2012 9:41 pm

Hi, I'm repeating my comments from phpbb.com here.
Post Revisions/Edit Log by David King, United States
David will implement a post revision system, enabling moderators to view detailed differences between edits made to a post with the ability to revert. This feature will have an option to be disabled, and can be further configured via the ACP to enable revision pruning.
This is an excellent feature, and in fact one that I've written for *mumble-older-version-of-phpbb-mumble* and am about to go live with. I would suggest to the coder, however, that a separate table is not required. In my implementation I create additional rows in the posts_text table, and store a tag called post_version that differentiates the various versions. The current version is always 0, so most of the phpBB code that retrieves post information simply adds "and post_version = 0" to the SQL code. By including this in the index performance does not suffer. Every edit for a post becomes an insert, with pseudo-code as follows:

Code: Select all

Update posts_text set post_version = post_version + 1;
Insert new post_text values;
The moderator option to review post versions include HTML "diff" code borrowed from Wordpress which seems to work similar to the library linked in the proposal linked in the announcement topic. A moderator can restore an older version of a post edit, but so can the owner of the post. A moderator can also lock a post to prevent further editing.

It's an excellent feature / concept, and I'm glad to see it coming about. It can help board moderators combat spam (as mentioned in the proposal) but it can also help when board users post a question, get their answer, and then go back and edit their original post to "thanks" or something that completely destroys the flow of the conversation.

I can also envision it being very powerful when used for legal investigations. Since nothing is ever edited (or deleted) only stored as versions, we can recreate the entire process that it took to get to a particular post. By including a "soft delete" feature you have a fairly bullet-proof legal archive of post activity.
imkingdavid wrote:drathbun, thanks for the comments on the post revisions feature. I will consider what you said about using the existing posts table instead of adding a new revisions table, but there are a couple of issues that I can foresee with the fact that each revision would be then treated as a different post by the software.
In phpBB2 this was easier, because the post and post text were stored in different tables. But even in phpBB3 what you said is not true, at least the way I implemented it. Add one new field to the posts table called post_version. The entire rest of the phpBB software then only ever looks at post_version = 0, which is always the current (or most recently restored) version of the post.
For instance, let's say that a moderator locks editing to the current revision of the post, but the post is then switched to a different revision.
This should never happen if a post is locked. Once a post is locked, nothing is changed. You only need to check post_version = 0 for this. Again, this was easier to do for phpbb2 since the post and the post text were already in separate tables. Which I had forgotten about in phpbb3, to be honest. So the reality is that I am already using separate tables since that's the way phpbb2 works. ;) I should retract my suggestion, since the way you have thought it out for phpBB3 looks more like phpBB2, and matches what I did.
Anyway, to all, from now on please use this topic for discussion of the GSoC/PSoC program, and discuss the actual proposals and their implementations on the Area51 forum in the respective RFCs or in the Tracker in the respective tickets instead of here.
Done, and sorry about posting in the other topic.

For what it's worth, I use a different table to store the post edit reason. I have a table called post_notes that stores everything that ever happens about a post, from moderator actions to post edits (now, anyway, that feature is only something I finished earlier this year). So I don't need to store a post edit reason, or who edited, or when, as that's all stored in a separate table already.

Good luck with the project, it's a feature that I think will be well received. It's not something you need every day, but it's indispensable when you do need it.
Sometimes you're the windshield, sometimes you're the bug.

User avatar
imkingdavid
Registered User
Posts: 1050
Joined: Thu Jul 30, 2009 12:06 pm

Re: [RFC] Post Revisions

Post by imkingdavid » Mon Apr 23, 2012 9:52 pm

Thank you for the comments and for displaying interest in the feature.
drathbun wrote:Done, and sorry about posting in the other topic.
No problem, just wanted to make sure that discussion doesn't get too spread out because then it's hard to follow. :)

Anyway, I am sort of waiting until May 21 to really get started on this, since that is the date set by GSoC as the beginning date for students to start coding their proposals. However, I did do some very initial mostly schema-related stuff about 21 days ago. You can view changes as they occur here. Eventually I'll make a Pull Request for the feature and post it here (probably once I've started actually coding it) and anyone is welcome to comment on the code.
I do custom MODs. PM for a quote!
View My: MODs | Portfolio
Please do NOT contact for support via PM or email.
Remember, the enemy's gate is down.

User avatar
DavidIQ
Customisations Team Leader
Customisations Team Leader
Posts: 1731
Joined: Thu Mar 02, 2006 4:29 pm
Location: Earth
Contact:

Re: [RFC] Post Revisions

Post by DavidIQ » Tue Apr 24, 2012 3:10 am

Sounds like a good idea. And I think including drathbun's suggestions would greatly simplify things.

Also wanted to reply so we'd have three Davids in one topic.
:mrgreen:
Image

User avatar
drathbun
Registered User
Posts: 72
Joined: Wed Feb 15, 2006 6:40 pm
Location: Texas
Contact:

Re: [RFC] Post Revisions

Post by drathbun » Tue Apr 24, 2012 1:52 pm

DavidIQ wrote:Also wanted to reply so we'd have three Davids in one topic.
:mrgreen:
:lol:

One of the things that you're concerned about, and rightly so in my opinion, is the amount of space taken by post revisions. It would be interesting to (assuming you have access to it) count the percentage of posts at phpbb.com that have been edited, and how many times. I have found that the percent of edited posts is actually much smaller than expected. Also you would make sure (I am assuming again!) that the replicated post content is not included in the search engine... by having your post revisions in a different table that should be done automatically with the mysql search index (it can't index what it doesn't see in the original table). For phpBB2 since I stored my post revisions in the phpbb_posts_text table I had to make sure that anything with post_version > 0 did not appear in the search word tables. I don't remember if it was a standard function or one I wrote, but as posts are revised I call a function named something like "remove_search_words()" and the argument is the post_id. First I remove the words for the post altogether, and then once the post revision is stored I add the words back again. The brute force "drop all and rebuild" approach is less efficient from the database perspective, but far easier to code and debug than trying to run a diff on the posts and only remove the words that are dropped and adding words that are new. It also means I use the standard phpBB search functions rather than creating my own.

I'm rambling a bit here, but thought I would offer some information about how I implemented this for phpbb2 and see if you can find any nuggets or strategies that you feel would be appropriate for your version. Feel free to ignore everything if you like. :)

For the restore process, I didn't dig too much into permissions, but of course phpBB2 has a far less powerful permissions system than phpBB3. In my case, a moderator can pull up the post history and do one of 3 actions:

Restore Post
Lock Post
Restore + Lock Post

Restore post is an icon / link on each post in the history. Clicking that link on that specific revision will make it the current revision. More on that in a moment.

Lock Post updates the source post row (phpbb_posts in my case with phpbb2) and sets the post_locked field to 1. In this case, the original user can no longer edit the post. Moderators can still edit the post, as of course Administrators as well.

Restore + Lock Post simply does both of the above actions in one step. That way a moderator can restore an older version of a post and lock it all in one operation.

Once a post is locked, when a moderator views the post history the option changes from "Lock" to "Unlock" which resets the post_locked field back to 0 and editing is once again allowed by the user.

Users own the post, and as long as the post or topic is not locked then they can also use the post history feature. However a user can only use the Restore feature, they cannot lock their own post. If a user goes through multiple edits and decides they want to go back to an earlier version, they have the option to do that. Users can view the post history of other posts in order to see what edits were done, but cannot restore or lock a post version that does not belong to them.

In standard phpBB2 an edited post is tagged with ("Post edited X times") at the bottom of the post text, but only if the original user edited the post. If a moderator or admin edited the post, it does not get tagged. In my version, every edit triggers the post edit count, and every version of the post is captured, and the person doing the edit is also captured. Instead of the text, I have an icon that appears in the lower-right corner of the post that is visible only after a post has been edited at least once. Clicking that icon brings up the post history, and the options available at that point are determined by whether the forum is locked (which implies a post lock), whether the topic is locked (which also implies a post lock), or if the post itself is locked.

So what about the restore process? Here's how a couple of scenarios would play out.

1. User creates a post. Row is inserted, and post_version is zero (0).
2. User edits the post. The current row is updated with post_version set to 1 and the new text is inserted as post_version = 0. In this case the version could be thought of as "how many edits ago" and the current version of the post is always zero. This make the code very easy to deal with. Imagine if you went the other way, and the "current" version of the post was based on the post_edit_count. For some posts the "current" post text would be zero, for others it would be one, for still others two, and so on. By tracking numbers backwards, where current version is always zero, the code is far more consistent. I know that the current version is always zero, and I know that every post will have a current version. Some posts may have additional versions of the text, and the post version sorted in reverse order (4 - 3 - 2 - 1 - 0) will show the progression of the post edits. The highest number post_version is always the first version of the post, and in descending order the post_version shows the subsequent edits in order.
3. User edits a post again, the current rows (two now) in the database are again updated, so I have post version 2 and 1, and then the current edits are stored as post_version of 0 again.

Now suppose a moderator wants to go back and restore the post to the original version, which is now post_version of 2. The restore process does not simply copy the text back into the current post revision. Instead, the same process is used. There are now 3 rows (2 - 1 - 0) in the database, and the moderator wants to restore 2 to be current. I update the three existing rows so they're now 3 - 2 - 1 and the original text is re-inserted as row 0, or current. The notes would say "Moderator restored older version of the post" without referencing the post version by number, because of two reasons. First, the post version number could change if further edits are done, rendering any reference to the version incorrect. And second, since the edits can easily be seen on the post history screen, the version of the post that was restored should be easy to determine.

In phpBB2 the phpbb_posts_text table includes only the post_id, it does not include the topic_id or forum_id. So any post moderator actions such as moving a topic to another forum, or splitting a topic, these do not impact the post revisions because they're tied only to the post. With phpBB3 and your idea of using a separate table you should gain the same benefit. With phpBB2 the tables were already separate and fit quite well with my implementation.

There is an interesting wrinkle with the update to the post_version field. In order to be efficient, there is a unique index on the phpbb_posts_text table (again, this is phpbb2) on post_id. The post_id column is no longer unique since I'm storing multiple versions. So my new unique index is post_id + post_version. The + does not imply addition, it just means both columns are now included in the index. This works great until I try to update the post_version to store revisions. The first update sets 0 to 1 and includes a new zero row, and this is fine. But the second update failed. When the update ran, it tried to set 0 = 0 + 1 or 1, and there was already a row that existed with a version of 1. So in order to get the update to work, I include an ORDER BY clause on the update statement. This works fine in MySQL but I don't know how cross-database compatible it is.

Code: Select all

        if ( $mode == 'editpost' )
        {
                $sql =  "UPDATE " . POSTS_TEXT_TABLE . "
                        SET     post_version = post_version + 1
                        WHERE   post_id = $post_id
                        ORDER BY post_version DESC";
                if ( !$db->sql_query($sql) )
                {
                        message_die (GENERAL_ERROR, 'Error incrementing post version', '', __LINE__, __FILE__, $sql);
                }
        }
Without the ORDER BY clause the update failed on any post with revisions. It worked fine the first time because there are no conflicts with the key.

And the final note in my rambling. :) I took at look at the "diff" code you linked to. The code I am using is very simple, and was taken from the post diff highlighting option in Wordpress. It's a rather elegant bit of code that stores the post as an array of words and then uses array functions to determine the differences. However, a new line screws things up. Here's how the highlighting (for diffs) comes out when a newline is added at the end:

Image

When I played around with the code you linked, there were different modes. One was "word" mode and it worked the same way. But one was "character" mode and it ignored newlines and did not report them as differences. That is preferable, and I probably will try to see if I can make the same change to my code. If the only difference between posts is this:
First Version
First Version
New line of text
Then the word "Version" should not be highlighted as a change, and in my current version it is as shown on the image above.

Again, all of this rambling is just that, some rambling thoughts that you may or may not find useful. If you have any questions about what I did, I'm happy to tell you what I did and why if it helps you at all.

And I would be very interested in the statistic I mentioned above, if you have access to it, that being what percentage of posts at phpbb.com have been edited at least once.
Sometimes you're the windshield, sometimes you're the bug.

User avatar
MichaelC
Development Team
Development Team
Posts: 889
Joined: Thu Jan 28, 2010 6:29 pm

Re: [RFC] Post Revisions

Post by MichaelC » Tue Apr 24, 2012 9:22 pm

Best way I can see would be too use permissions for flexibility. Have u_view_revisions, u_view_own_revisions, u_restore_own_revisions and u_delete_own_revisions. For moderators have m_delete_revisions and m_restore_revisions (or include restoring revisions as part of the editing permission).

And then implement the soft/hard delete system into this so you can soft/hard delete posts and offer a moderator permission for that and have u_delete_own_revisions as soft delete. Hiding it from those without m_view_soft_delete_revisions.

All of these could be global or forum specific.

Quite a lot of permissions but this is something that views will vary from admin to admin, board to board.

What do you think david?Sorry, couldn't resist. :P
Formerly known as Unknown Bliss
psoTFX wrote: I went with Olympus because as I said to the teams ... "It's been one hell of a hill to climb"
No unsolicited PMs please except for quotes.

User avatar
imkingdavid
Registered User
Posts: 1050
Joined: Thu Jul 30, 2009 12:06 pm

Re: [RFC] Post Revisions

Post by imkingdavid » Tue Apr 24, 2012 11:58 pm

Unknown Bliss wrote:Best way I can see would be too use permissions for flexibility. Have u_view_revisions, u_view_own_revisions, u_restore_own_revisions and u_delete_own_revisions. For moderators have m_delete_revisions and m_restore_revisions (or include restoring revisions as part of the editing permission).

And then implement the soft/hard delete system into this so you can soft/hard delete posts and offer a moderator permission for that and have u_delete_own_revisions as soft delete. Hiding it from those without m_view_soft_delete_revisions.

All of these could be global or forum specific.

Quite a lot of permissions but this is something that views will vary from admin to admin, board to board.

What do you think david?Sorry, couldn't resist. :P
Fatal Error: Uncaught exception "TooManyPermissionsException" with message "That's way too many permissions!" in current_post on line 1.

But seriously, yeah that's way too much. :P
I do custom MODs. PM for a quote!
View My: MODs | Portfolio
Please do NOT contact for support via PM or email.
Remember, the enemy's gate is down.

User avatar
naderman
Product Manager
Product Manager
Posts: 1727
Joined: Sun Jan 11, 2004 2:11 am
Location: Karlsruhe, Germany
Contact:

Re: [RFC] Post Revisions

Post by naderman » Wed Apr 25, 2012 12:25 am

There is also something else to consider here. Maybe we do want to version other parts of the post. So we can keep track of which settings were changed when. Things like attachments will need to be taken into account as well.

User avatar
drathbun
Registered User
Posts: 72
Joined: Wed Feb 15, 2006 6:40 pm
Location: Texas
Contact:

Re: [RFC] Post Revisions

Post by drathbun » Wed Apr 25, 2012 12:59 pm

I thought about tracking attachments but decided against it. For one thing, I didn't want to have to analyze the actual attachment to try to show differences, which of course would not be required... you could just track the files themselves without worrying about analyzing differences. But then I got to thinking... how would attachment revisions count against a user total attachment quota?

And things like did the signature bit get checked, was bbcode checked, were smilies enabled, those don't materially affect the post content so I decided not to track those. You of course may decide differently.

Signature updates should probably be tracked, and I don't currently do that in my version.

On the permissions thing... I think the more options you can provide, the more flexible the feature will become, and the better it will be received.
Sometimes you're the windshield, sometimes you're the bug.

Post Reply