DavidIQ wrote:Also wanted to reply so we'd have three Davids in one topic.
One of the things that you're concerned about, and rightly so in my opinion, is the amount of space taken by post revisions. It would be interesting to (assuming you have access to it) count the percentage of posts at phpbb.com that have been edited, and how many times. I have found that the percent of edited posts is actually much smaller than expected. Also you would make sure (I am assuming again!) that the replicated post content is not included in the search engine... by having your post revisions in a different table that should be done automatically with the mysql search index (it can't index what it doesn't see in the original table). For phpBB2 since I stored my post revisions in the phpbb_posts_text table I had to make sure that anything with post_version > 0 did not appear in the search word tables. I don't remember if it was a standard function or one I wrote, but as posts are revised I call a function named something like "remove_search_words()" and the argument is the post_id. First I remove the words for the post altogether, and then once the post revision is stored I add the words back again. The brute force "drop all and rebuild" approach is less efficient from the database perspective, but far easier to code and debug than trying to run a diff on the posts and only remove the words that are dropped and adding words that are new. It also means I use the standard phpBB search functions rather than creating my own.
I'm rambling a bit here, but thought I would offer some information about how I implemented this for phpbb2 and see if you can find any nuggets or strategies that you feel would be appropriate for your version. Feel free to ignore everything if you like.
For the restore process, I didn't dig too much into permissions, but of course phpBB2 has a far less powerful permissions system than phpBB3. In my case, a moderator can pull up the post history and do one of 3 actions:
Restore + Lock Post
Restore post is an icon / link on each post in the history. Clicking that link on that specific revision will make it the current revision. More on that in a moment.
Lock Post updates the source post row (phpbb_posts in my case with phpbb2) and sets the post_locked field to 1. In this case, the original user can no longer edit the post. Moderators can still edit the post, as of course Administrators as well.
Restore + Lock Post simply does both of the above actions in one step. That way a moderator can restore an older version of a post and lock it all in one operation.
Once a post is locked, when a moderator views the post history the option changes from "Lock" to "Unlock" which resets the post_locked field back to 0 and editing is once again allowed by the user.
Users own the post, and as long as the post or topic is not locked then they can also use the post history feature. However a user can only use the Restore feature, they cannot lock their own post. If a user goes through multiple edits and decides they want to go back to an earlier version, they have the option to do that. Users can view the post history of other posts in order to see what edits were done, but cannot restore or lock a post version that does not belong to them.
In standard phpBB2 an edited post is tagged with ("Post edited X times") at the bottom of the post text, but only if the original user edited the post. If a moderator or admin edited the post, it does not get tagged. In my version, every edit triggers the post edit count, and every version of the post is captured, and the person doing the edit is also captured. Instead of the text, I have an icon that appears in the lower-right corner of the post that is visible only after a post has been edited at least once. Clicking that icon brings up the post history, and the options available at that point are determined by whether the forum is locked (which implies a post lock), whether the topic is locked (which also implies a post lock), or if the post itself is locked.
So what about the restore process? Here's how a couple of scenarios would play out.
1. User creates a post. Row is inserted, and post_version is zero (0).
2. User edits the post. The current row is updated with post_version set to 1 and the new text is inserted as post_version = 0. In this case the version could be thought of as "how many edits ago" and the current version of the post is always zero. This make the code very easy to deal with. Imagine if you went the other way, and the "current" version of the post was based on the post_edit_count. For some posts the "current" post text would be zero, for others it would be one, for still others two, and so on. By tracking numbers backwards, where current version is always zero, the code is far more consistent. I know that the current version is always zero, and I know that every post will have a current version. Some posts may have additional versions of the text, and the post version sorted in reverse order (4 - 3 - 2 - 1 - 0) will show the progression of the post edits. The highest number post_version is always the first version of the post, and in descending order the post_version shows the subsequent edits in order.
3. User edits a post again, the current rows (two now) in the database are again updated, so I have post version 2 and 1, and then the current edits are stored as post_version of 0 again.
Now suppose a moderator wants to go back and restore the post to the original version, which is now post_version of 2. The restore process does not
simply copy the text back into the current post revision. Instead, the same process is used. There are now 3 rows (2 - 1 - 0) in the database, and the moderator wants to restore 2 to be current. I update the three existing rows so they're now 3 - 2 - 1 and the original text is re-inserted as row 0, or current. The notes would say "Moderator restored older version of the post" without referencing the post version by number, because of two reasons. First, the post version number could change if further edits are done, rendering any reference to the version incorrect. And second, since the edits can easily be seen on the post history screen, the version of the post that was restored should be easy to determine.
In phpBB2 the phpbb_posts_text table includes only the post_id, it does not include the topic_id or forum_id. So any post moderator actions such as moving a topic to another forum, or splitting a topic, these do not impact the post revisions because they're tied only to the post. With phpBB3 and your idea of using a separate table you should gain the same benefit. With phpBB2 the tables were already separate and fit quite well with my implementation.
There is an interesting wrinkle with the update to the post_version field. In order to be efficient, there is a unique index on the phpbb_posts_text table (again, this is phpbb2) on post_id. The post_id column is no longer unique since I'm storing multiple versions. So my new unique index is post_id + post_version. The + does not imply addition, it just means both columns are now included in the index. This works great until I try to update the post_version to store revisions. The first update sets 0 to 1 and includes a new zero row, and this is fine. But the second update failed. When the update ran, it tried to set 0 = 0 + 1 or 1, and there was already a row that existed with a version of 1. So in order to get the update to work, I include an ORDER BY clause on the update statement. This works fine in MySQL but I don't know how cross-database compatible it is.
Code: Select all
if ( $mode == 'editpost' )
$sql = "UPDATE " . POSTS_TEXT_TABLE . "
SET post_version = post_version + 1
WHERE post_id = $post_id
ORDER BY post_version DESC";
if ( !$db->sql_query($sql) )
message_die (GENERAL_ERROR, 'Error incrementing post version', '', __LINE__, __FILE__, $sql);
Without the ORDER BY clause the update failed on any post with revisions. It worked fine the first time because there are no conflicts with the key.
And the final note in my rambling.
I took at look at the "diff" code you linked to. The code I am using is very simple, and was taken from the post diff highlighting option in Wordpress. It's a rather elegant bit of code that stores the post as an array of words and then uses array functions to determine the differences. However, a new line screws things up. Here's how the highlighting (for diffs) comes out when a newline is added at the end:
When I played around with the code you linked, there were different modes. One was "word" mode and it worked the same way. But one was "character" mode and it ignored newlines and did not report them as differences. That is preferable, and I probably will try to see if I can make the same change to my code. If the only difference between posts is this:
New line of text
Then the word "Version" should not be highlighted as a change, and in my current version it is as shown on the image above.
Again, all of this rambling is just that, some rambling thoughts that you may or may not find useful. If you have any questions about what I did, I'm happy to tell you what I did and why if it helps you at all.
And I would be very interested in the statistic I mentioned above, if you have access to it, that being what percentage of posts at phpbb.com have been edited at least once.