https://tracker.phpbb.com/browse/PHPBB3-13891 - https://github.com/phpbb/phpbb/pull/3662
With the s9e\TextFormatter merged, I've been thinking about the right way to reparse rich text for future versions of phpBB. This is a quick dump of my observations. No PR and I don't think I can take over the task for the moment.
First, reparsing should be done through a service. It should be possible to schedule it via cron but it should be implemented as a service.
Last I checked there were at least 9 different places where rich text is used: posts, private messages, user signatures, poll titles, poll options, forum descriptions, forum rules, group descriptions and admin's contact info. Column names vary, storage format varies. So instead of a service, it should be a set of services. Those services should be tagged to make it possible to trigger a reparsing of everything that's rich text. Using a tag makes it easier for extensions to have their content reparsed.
The Support Toolkit can reparse posts in 3.0, does it also work on 3.1?
There's a potentially important trap lying in wait. phpBB's message_parser uses the global $user to determine which features are enabled. Some extensions such as ABBC3 police BBCode usage according to the current user. The whole system assumes that the current user is the author of the post or text being processed. That means that if an admin triggers a reparsing, it will be made according to their credential and not the original author's.
I have written a working example of a reparser service in this feature branch: https://github.com/phpbb/phpbb/compare/ ... r?expand=1 (only the post_text service is included)
The unit of work is called a record, a set of values. Each service is responsible for retrieving and updating records in the database. For post texts, it's a set of columns from the same row. For text stored in the config table, it's a set of values stored in multiple rows. Forum descriptions and forum rules are two records stored in the same row. Pseudo-code for reparse_range():
Code: Select all
reparse_range($min_id, $max_id)
get_records($min_id, $max_id)
For each record $record
reparse_record($record)
If reparsed text differs: save_record($record)
phpbb\textreparser\reparser_interface
public function reparse_range($min_id, $max_id)
public function get_max_id()
phpbb\textreparser\base implements reparser_interface
public function reparse_range($min_id, $max_id)
abstract public function get_max_id()
abstract protected function get_records($min_id, $max_id)
abstract protected function save_record(array $record)
protected function reparse_record(array $record)
phpbb\textreparser\posts extends base
public function get_max_id()
protected function get_records($min_id, $max_id)
protected function save_record(array $record)
Talking about cron, here's how a global reparsing should work:
- Get all tagged services.
- Retrieve the highest ID for each type of content. Store in a config value, e.g.
Code: Select all
post_text: 123456 user_sig: 4321
- Every execution should reparse a block of N records of one content type (e.g. post_text
WHERE post_id BETWEEN 123406 AND 123456
), starting from the highest ID and working backwards.- Working backwards means that content posted after the reparsing has started will not be reparsed needlessly.
- It also means that the most recent content gets reparsed first. The most recent content is also the most likely to be read by users first.
- There could be one single cron job that reparses everything by changing the type of content being reparsed at each execution. That would guarantee that a global reparsing would not trigger a dozen cron jobs rewriting ten different tables at the same time with the database performance going down the drain.