[RFC] Integrate s9e\TextFormatter

These requests for comments/change have lead to an implemented feature that has been successfully merged into the 3.2/Rhea branch. Everything listed in this forum will be available in phpBB 3.2.
Post Reply
User avatar
EXreaction
Registered User
Posts: 1555
Joined: Sat Sep 10, 2005 2:15 am

Re: [RFC] Integrate s9e\TextFormatter

Post by EXreaction »

In your PR I see you're using a check to see whether the message was originally parsed using our old bbcode engine or your new one.

We do want to completely remove the old BBCode parser after this would be merged, so rather than checking which it was originally parsed with, what will need to be done is a migration be created which reparses all posts with the new engine (should be done in steps as this may take a while, the Wiki on migrations should explain it or we can help in IRC), then all the code referencing the old engine be removed.


We don't currently have any way of parsing custom bbcodes differently for different styles. That would be a very nice feature to have, but the custom bbcode UI would need a bit of a redesign in order to support that. If you don't want to redesign the UI for that just yet, having support for it in the backend (which we may need to tweak later depending on the implementation for it) would be great for now. I can help explain how the styles system is setup if you're interested in setting that up from the start (just let me know!).

User avatar
JoshyPHP
Registered User
Posts: 381
Joined: Fri Jul 08, 2011 9:43 pm

Re: [RFC] Integrate s9e\TextFormatter

Post by JoshyPHP »

EXreaction wrote:In your PR I see you're using a check to see whether the message was originally parsed using our old bbcode engine or your new one.
For reference, this is the check EXreaction is talking about. What it does is test whether a text starts with <pt or <rt. s9e\TextFormatter generates an XML representation of the parsed text, which always has a "pt" or "rt" root node. Since a literal < would be escaped by phpBB, a text that starts with <pt is guaranteed to be XML from s9e\TextFormatter. It's a cheap a reliable way to distinguish old text from new.
EXreaction wrote:We do want to completely remove the old BBCode parser after this would be merged, so rather than checking which it was originally parsed with, what will need to be done is a migration be created which reparses all posts with the new engine
Which makes me think, what future version of phpBB would you want to target for this? I have side-stepped the issue of migration because it might just be the most critical change, the one that carries the greatest potential for negative fallout, and I wanted to take the time to discuss and carefully plan it. That's why I'm currently keeping the old rendering engine around, so that I don't have to touch old posts for the time being.

Going forward, I don't really know what's the best strategy is. You could keep the old rendering around for a whole 3.x cycle before completely removing it, while slowly migrating content via a cron job or something similar. Or you can migrate the content upfront, but wouldn't that mean taking a board offline, potentially for an extended period of time as you mentionned? I don't know how phpBB migration works, not sure if the board remains usable while it's running. The posts table is the biggest of any forum, and a full migration means rewriting every row because as I said, s9e\TextFormatter produces XML so even an empty post would be rewritten as <pt></pt>.
EXreaction wrote:We don't currently have any way of parsing custom bbcodes differently for different styles.
I assume you meant rendering custom BBCodes differently. I'll take you up on your offer as soon as I start looking into styling. ;)

On a related subject, you need to decide what to do about the default BBCodes. Currently, their definitions reside in an XML file that were hastily thrown into includes/. Ultimately, they may be better stored in the database or hardcoded in a class. If they're stored in the database, it makes them directly editable by the end user. If they're in a class, they're potentially more customisable.

User avatar
JoshyPHP
Registered User
Posts: 381
Joined: Fri Jul 08, 2011 9:43 pm

Re: [RFC] Integrate s9e\TextFormatter

Post by JoshyPHP »

I'm currently working on posting limits. Here's a breakdown of how the legacy code behave when those limits are exceeded and the default behaviour of the new code.

max_*_font_size
  • old: cannot post + error displayed on submit, BBCode silently ignored on preview
  • new: cannot post + error displayed on submit, BBCode silently ignored on preview
max_*_img_height/max_*_img_width
  • old: cannot post + error displayed on submit, BBCode silently ignored on preview
  • new: cannot post + error displayed on submit, BBCode silently ignored on preview
max_*_smilies
  • old: cannot post + error displayed on submit, smilies silently disabled on preview
  • new: cannot post + error displayed on submit, extra smilies are silently ignored on preview
max_*_urls
  • old: cannot post + error displayed on submit, limit not enforced on preview
  • new: cannot post + error displayed on submit, extra URLs are silently ignored on preview
The old behaviour is inconsistent between submission (error displayed) and preview (no error displayed), or even amongst the different limits on preview which are sometimes applied, not applied, or applied only to the invalid parts. I'd like to normalize the behaviour, but that means modifying posting.php and I'm afraid of the potential for side-effects or unintended consequences.

Update: I've updated the list to reflect the latest code

User avatar
Pony99CA
Registered User
Posts: 986
Joined: Sun Feb 08, 2009 2:35 am
Location: Hollister, CA
Contact:

Re: [RFC] Integrate s9e\TextFormatter

Post by Pony99CA »

JoshyPHP wrote:I'm currently working on posting limits. Here's a breakdown of how the legacy code behave when those limits are exceeded and the default behaviour of the new code.

max_*_font_size
  • old: cannot post + error displayed on submit, BBCode silently ignored on preview
  • new: cannot post + error displayed on submit, BBCode silently ignored on preview
max_*_img_height/max_*_img_width
  • old: cannot post + error displayed on submit, BBCode silently ignored on preview
  • new: cannot post + error displayed on submit, BBCode silently ignored on preview
max_*_smilies
  • old: cannot post + error displayed on submit, smilies silently disabled on preview
  • new: cannot post + error displayed on submit, extra smilies are silently ignored on preview
max_*_urls
  • old: cannot post + error displayed on submit, limit not enforced on preview
  • new: cannot post + error displayed on submit, extra URLs are silently ignored on preview
The old behaviour is inconsistent between submission (error displayed) and preview (no error displayed), or even amongst the different limits on preview which are sometimes applied, not applied, or applied only to the invalid parts. I'd like to normalize the behaviour, but that means modifying posting.php and I'm afraid of the potential for side-effects or unintended consequences.

Update: I've updated the list to reflect the latest code
What does this have to do with the text formatter? Shouldn't that be a new topic?

Anyway, I think that lmits should be enforced on Preview (and saving Drafts). It's very annoying to not get an error during Preview and then getting one when you try to Submit. Preview should work the same way that Submit does except the post is displayed instead of actually submitted.

Steve
Silicon Valley Pocket PC (http://www.svpocketpc.com)
Creator of manage_bots and spoof_user (ask me)
Need hosting for a small forum with full cPanel & MySQL access? Contact me or PM me.

User avatar
JoshyPHP
Registered User
Posts: 381
Joined: Fri Jul 08, 2011 9:43 pm

Re: [RFC] Integrate s9e\TextFormatter

Post by JoshyPHP »

Pony99CA wrote:What does this have to do with the text formatter? Shouldn't that be a new topic?
When I originally posted that message, the new code would silently ignore invalid stuff and even if it did generate errors, there was no interface for the parse_message class to collect errors from the parser. I posted that list to keep track of that change in behaviour. Now the parser generates errors and the parse_message class collects them so now it does display the same error messages and I've updated the post accordingly.

In the new system, limits are enforced on preview but parse_message doesn't display the errors. I agree that it would be better if it did, and that it should be the subject of another RFC.

User avatar
EXreaction
Registered User
Posts: 1555
Joined: Sat Sep 10, 2005 2:15 am

Re: [RFC] Integrate s9e\TextFormatter

Post by EXreaction »

JoshyPHP wrote:
EXreaction wrote:We do want to completely remove the old BBCode parser after this would be merged, so rather than checking which it was originally parsed with, what will need to be done is a migration be created which reparses all posts with the new engine
Which makes me think, what future version of phpBB would you want to target for this? I have side-stepped the issue of migration because it might just be the most critical change, the one that carries the greatest potential for negative fallout, and I wanted to take the time to discuss and carefully plan it. That's why I'm currently keeping the old rendering engine around, so that I don't have to touch old posts for the time being.

Going forward, I don't really know what's the best strategy is. You could keep the old rendering around for a whole 3.x cycle before completely removing it, while slowly migrating content via a cron job or something similar. Or you can migrate the content upfront, but wouldn't that mean taking a board offline, potentially for an extended period of time as you mentionned? I don't know how phpBB migration works, not sure if the board remains usable while it's running. The posts table is the biggest of any forum, and a full migration means rewriting every row because as I said, s9e\TextFormatter produces XML so even an empty post would be rewritten as <pt></pt>.
We definitely don't want to keep the old rendering engine around for the whole of 3.x, if this would all be ready for 3.2, that's what it would go into (include the new engine and completely remove the old).

The migration would take quite a while and normally, yes, the board would go offline during the process to prevent any issues. We can find ways around that if we test it on a large board and it takes a very long time, but we'll look into that once we get to testing the conversion.
On a related subject, you need to decide what to do about the default BBCodes. Currently, their definitions reside in an XML file that were hastily thrown into includes/. Ultimately, they may be better stored in the database or hardcoded in a class. If they're stored in the database, it makes them directly editable by the end user. If they're in a class, they're potentially more customisable.
Default BBCodes will eventually go into the BBCode table. If you can do this from the start that's great, otherwise it may be good to have a separate PR for this which would be merged after the main PR and keep them hard-coded for now.

User avatar
brunoais
Registered User
Posts: 964
Joined: Fri Dec 18, 2009 3:55 pm

Re: [RFC] Integrate s9e\TextFormatter

Post by brunoais »

EXreaction wrote: We definitely don't want to keep the old rendering engine around for the whole of 3.x, if this would all be ready for 3.2, that's what it would go into (include the new engine and completely remove the old).
No need.
We just need to keep a single function that removes the uid's from the tags and then feed that to textFormatter while, at the same time, storing textformatter's middle state.
This will only slow down the first load of the page while all this is done for all posts.

Nothing else is required here.
Are you sure that's not a good option now that we have it?
That function is just about 1% (if as much) of the current BBCode parser. IIRC it's just a single preg_replace().
Are you still sure that's not a good way to go?
There can be boards that would require days processing just for this due to the massive amount of posts and whatnot.

I think it's worth it. We can drag that functionality for many versions and we won't force converting posts that no one will ever read but are still there just in case.

User avatar
JoshyPHP
Registered User
Posts: 381
Joined: Fri Jul 08, 2011 9:43 pm

Re: [RFC] Integrate s9e\TextFormatter

Post by JoshyPHP »

A bit of a daily update: today, I made polls work. This doesn't sound like much, but it took more time than I'm comfortable admitting. parse_message (the class in message_parser.php) would be a prime candidate to be refactored into something less complicated, especially once the old text formatting code is removed.
EXreaction wrote:Default BBCodes will eventually go into the BBCode table. If you can do this from the start that's great
It's all the same for me. Whether they're loaded from a table or an XML file, BBCode definitions go through the same code.

User avatar
JoshyPHP
Registered User
Posts: 381
Joined: Fri Jul 08, 2011 9:43 pm

Re: [RFC] Integrate s9e\TextFormatter

Post by JoshyPHP »

The next item on my TODO list is fetching BBCode templates from styles, which I'll start soon. Unless instructed otherwise, I'll reuse the same routines as bbcode::bbcode_cache_init() except with objects from the service container instead of globals.
EXreaction wrote:I can help explain how the styles system is setup if you're interested in setting that up from the start (just let me know!).
I think I understand enough of the basics to make it work:
  • Default templates are hardcoded in bbcode::bbcode_tpl()
  • Styles define their own BBCode templates in bbcode.html
  • The bitfield determines which hardcoded template gets overridden by the style
Is there anything else?

What I don't know is:
  • How can I get a list of all the installed styles?
  • How can I get notified when a style's bbcode.html template is modified? (to invalidate the cache)

User avatar
brunoais
Registered User
Posts: 964
Joined: Fri Dec 18, 2009 3:55 pm

Re: [RFC] Integrate s9e\TextFormatter

Post by brunoais »

Note: This is mostly IMO, do not take it as a dev's opinion.
JoshyPHP wrote: I think I understand enough of the basics to make it work:
  • Default templates are hardcoded in bbcode::bbcode_tpl()
Correct
JoshyPHP wrote:
  • Styles define their own BBCode templates in bbcode.html
Correct
JoshyPHP wrote:
  • The bitfield determines which hardcoded template gets overridden by the style
Wrong.
The bitfield marks which BBCodes were used in that text out of all the predefined ones.
JoshyPHP wrote: What I don't know is:
  • How can I get notified when a style's bbcode.html template is modified? (to invalidate the cache)
(for the 1st I dunno)
Consider 2 modes.
In one mode, you don't cache, you always get the ones from the file/DB (when the ACP's option "Recompile stale style components" is on)
In the other mode, you cache and you complete disregard the file changes, you just don't care, even if it changes, you consider the cache value as correct (when the ACP's option "Recompile stale style components" is off).
Then, every time the "purge the cache" is executed, you delete your cache of those kinds of things.

Post Reply