[RFC|Accepted] Updated BBcode engine

Note: We are moving the topics of this forum and it will be deleted at some point

Publish your own request for comments/change or patches for the next version of phpBB. Discuss the contributions and proposals of others. Upcoming releases are 3.2/Rhea and 3.3.
Post Reply
TerraFrost
Former Team Member
Posts: 90
Joined: Wed Feb 09, 2005 12:21 am

[RFC|Accepted] Updated BBcode engine

Post by TerraFrost » Mon May 17, 2010 8:30 pm

The feature/ascraeus-experiment branch has, among other things, a rewritten BBcode parser that protects against structurally invalid BBcodes. The proposal here is to integrate that parser into the regular develop branch.

To do so, fairly extensive changes to the feature/ascraeus-experiment parser are required. The required changes (although they're not necessarily complete) can be found here:

http://github.com/terrafrost/phpbb3/tre ... es/bbcode/

The next step, of course, is to integrate those changes into phpBB (the feature/ascraeus-experiment branch doesn't do this). Some work to that end can be found here:

http://github.com/terrafrost/phpbb3/blo ... parser.php
http://github.com/terrafrost/phpbb3/blo ... wtopic.php

With these changes a post can be made, previewed and edited with the feature/ascraeus-experiment parser. Without further modification, stuff like signatures aren't going to work, but I figure other people can help out with the more mundane / tedious aspects of this :)

As for what I mean by "structurally invalid BBcodes"...

Code: Select all

[list][*][quote]
[*][quote]test[/quote][/quote][/list]
That doesn't render correctly on phpBB3, for instance. It's also fairly easy to define custom BBcodes that can render in structurally invalid ways.

Other issues are as follows:
  • BBcode bitfields are not yet supported.
  • BBcodes need to be cached (ala bbcode::bbcode_cache_init)
  • It might be neat to have a new column in posts_text - original_post_text. post_text would then be short for parsed_post_text. The motiviation behind this idea is that decompiling an already parsed message won't necessarily get you the original message and sometimes having the original message can be useful for debugging or what-not.
  • The ACP module for defining custom BBcodes would need to be rewritten. Should the attempt be made to port phpBB3-style BBcodes to this new BBcode format?
  • Maybe the need to enumerate valid and invalid children could be partially eliminated by defining some elements as block level and other elements as inline? ie. <div> is a block level element and can contain in-line elements like <span> but because <span> is in-line it can't contain <div>.
Ticket: http://tracker.phpbb.com/browse/PHPBB3-9795

User avatar
naderman
Product Manager
Product Manager
Posts: 1727
Joined: Sun Jan 11, 2004 2:11 am
Location: Karlsruhe, Germany
Contact:

Re: [RFC] Updated BBcode engine

Post by naderman » Sun May 23, 2010 3:09 pm

Hey Jim, can you please rename your branch to the proper feature/bbcode (or whatever you want to call the feature). Make sure there is a ticket for this on the tracker and then please use git rebase to edit your commit messages to conform with http://wiki.phpbb.com/display/DEV/Git. Thanks.

It seems you have not backported any of the tests. It's important that our new BBCode parser be well tested, so please add unit tests.

User avatar
naderman
Product Manager
Product Manager
Posts: 1727
Joined: Sun Jan 11, 2004 2:11 am
Location: Karlsruhe, Germany
Contact:

Re: [RFC] Updated BBcode engine

Post by naderman » Sun May 23, 2010 3:20 pm

Why do we even still need that file at all? Seems to me that if we do we haven't done our job properly with the formatted text class
TerraFrost wrote:
  • BBcode bitfields are not yet supported.
Why would they. We don't need them anymore, do we?
TerraFrost wrote:
  • BBcodes need to be cached (ala bbcode::bbcode_cache_init)
What do you mean? The intention was to have one column with the result of the first pass in the posts table, and one with the original text. I asked Marek to work on storing only the structural information with offsets referencing the text as a result of the first pass, rather than a copy of the entire content. He never got around to that. We can then simply run one update query setting the structure info column to empty, which will reparse all posts when they are viewed. Adding a feature missed a lot in 3.0, the ability to reparse posts.
TerraFrost wrote:
  • The ACP module for defining custom BBcodes would need to be rewritten. Should the attempt be made to port phpBB3-style BBcodes to this new BBcode format?
No idea if that is feasible.
TerraFrost wrote:
  • Maybe the need to enumerate valid and invalid children could be partially eliminated by defining some elements as block level and other elements as inline? ie. <div> is a block level element and can contain in-line elements like <span> but because <span> is in-line it can't contain <div>.
I'm in favour of anything that would simplify this, so that sounds good. Might just be something for the front end defining bbcodes though. Not sure we need to have this in the bbcode class itself. We could just resolve such groups of elements and cache the result which is then passed to the bbcode parser for setup.

TerraFrost
Former Team Member
Posts: 90
Joined: Wed Feb 09, 2005 12:21 am

Re: [RFC] Updated BBcode engine

Post by TerraFrost » Thu Jun 03, 2010 11:55 am

naderman wrote:Hey Jim, can you please rename your branch to the proper feature/bbcode (or whatever you want to call the feature). Make sure there is a ticket for this on the tracker and then please use git rebase to edit your commit messages to conform with http://wiki.phpbb.com/display/DEV/Git. Thanks.

It seems you have not backported any of the tests. It's important that our new BBCode parser be well tested, so please add unit tests.
Will try to do that stuff this weekend :)
Why do we even still need that file at all? Seems to me that if we do we haven't done our job properly with the formatted text class
Right now, I'm mostly focusing on how BBcode is rendered in posts. message_parser does stuff for attachments and polls that'll probably be easy enough to move out but that's not my current focus. I figure the file will likely go at that point.
BBcode bitfields are not yet supported.
Why would they. We don't need them anymore, do we?
They're used to, among other things, change the default BBcode behavior in custom templates (as I understand it). Like if a particular bit is set in the bitfield the HTML replacement from bbcode.html will be used instead. bbcode.html could be dropped but the idea of per-template BBcode HTML replacements is nice.
BBcodes need to be cached (ala bbcode::bbcode_cache_init)
What do you mean? The intention was to have one column with the result of the first pass in the posts table, and one with the original text. I asked Marek to work on storing only the structural information with offsets referencing the text as a result of the first pass, rather than a copy of the entire content. He never got around to that. We can then simply run one update query setting the structure info column to empty, which will reparse all posts when they are viewed. Adding a feature missed a lot in 3.0, the ability to reparse posts.
Storing the result of the first pass and the original text is something I'd like to do. I'm not sure how feasible it'd be to just store offsets referencing the text though. I mean, conceptually, it doesn't seem that hard, but it'd likely require a fairly significant rewrite of the base BBcode parsing engine.

User avatar
naderman
Product Manager
Product Manager
Posts: 1727
Joined: Sun Jan 11, 2004 2:11 am
Location: Karlsruhe, Germany
Contact:

Re: [RFC] Updated BBcode engine

Post by naderman » Thu Jun 03, 2010 12:01 pm

TerraFrost wrote:
naderman wrote:
TerraFrost wrote:BBcode bitfields are not yet supported.
Why would they. We don't need them anymore, do we?
They're used to, among other things, change the default BBcode behavior in custom templates (as I understand it). Like if a particular bit is set in the bitfield the HTML replacement from bbcode.html will be used instead. bbcode.html could be dropped but the idea of per-template BBcode HTML replacements is nice.
Well from the experience with Olympus I would say that this is something nearly no style author even understands, it is causing more trouble than it helps. I believe this is one of the things style authors most often ask for help about. I think we should just generally always use bbcode.html for all bbcodes and cache them efficiently so it's not slow.
TerraFrost wrote:
naderman wrote:
TerraFrost wrote:BBcodes need to be cached (ala bbcode::bbcode_cache_init)
What do you mean? The intention was to have one column with the result of the first pass in the posts table, and one with the original text. I asked Marek to work on storing only the structural information with offsets referencing the text as a result of the first pass, rather than a copy of the entire content. He never got around to that. We can then simply run one update query setting the structure info column to empty, which will reparse all posts when they are viewed. Adding a feature missed a lot in 3.0, the ability to reparse posts.
Storing the result of the first pass and the original text is something I'd like to do. I'm not sure how feasible it'd be to just store offsets referencing the text though. I mean, conceptually, it doesn't seem that hard, but it'd likely require a fairly significant rewrite of the base BBcode parsing engine.
Alright, I suppose this is an optmisation we can revisit later if we still have time.

User avatar
EXreaction
Registered User
Posts: 1555
Joined: Sat Sep 10, 2005 2:15 am

Re: [RFC] Updated BBcode engine

Post by EXreaction » Thu Jun 03, 2010 1:58 pm

Can't we move most of the bbcodes to custom bbcodes?

There could be an option on the bbcode table to use the output from a bbcode.html file so you can set each style independently, but it's probably rather moot as long as we use classes that allow the standard bbcodes to be modified from the CSS easily enough.

User avatar
naderman
Product Manager
Product Manager
Posts: 1727
Joined: Sun Jan 11, 2004 2:11 am
Location: Karlsruhe, Germany
Contact:

Re: [RFC] Updated BBcode engine

Post by naderman » Thu Jun 03, 2010 2:08 pm

I think all BBCodes should be "custom". And every BBCode should be overwritten by the bbcode.html provided by the style if that file contains a definition for the given bbcode. I don't think bitfields are the right way to implement this.

User avatar
imkingdavid
Registered User
Posts: 1050
Joined: Thu Jul 30, 2009 12:06 pm

Re: [RFC] Updated BBcode engine

Post by imkingdavid » Sat Jun 05, 2010 7:06 pm

naderman wrote:I think all BBCodes should be "custom". And every BBCode should be overwritten by the bbcode.html provided by the style if that file contains a definition for the given bbcode. I don't think bitfields are the right way to implement this.
+1 I agree with this. :)
I do custom MODs. PM for a quote!
View My: MODs | Portfolio
Please do NOT contact for support via PM or email.
Remember, the enemy's gate is down.

BondGamer
Registered User
Posts: 112
Joined: Mon Dec 15, 2003 8:20 pm
Contact:

Re: [RFC] Updated BBcode engine

Post by BondGamer » Wed Jun 16, 2010 5:02 am

naderman wrote:I think all BBCodes should be "custom".
I was wondering why the trouble would be gone to have a standard set of bbCode along with custom. It would make much more sense for everything to be in the same system.

Also it would be nice if you could arrange the order of display in the ACP. Have the top-bottom go left-right.

User avatar
nickvergessen
Former Team Member
Posts: 733
Joined: Sun Oct 07, 2007 11:54 am
Location: Stuttgart, Germany
Contact:

Re: [RFC] Updated BBcode engine

Post by nickvergessen » Thu Jun 17, 2010 12:34 pm

BondGamer wrote:
naderman wrote:I think all BBCodes should be "custom".
I was wondering why the trouble would be gone to have a standard set of bbCode along with custom. It would make much more sense for everything to be in the same system.
What naderman wanted to say is just that we add the "standard" to the bbcode table aswell, which is currently not the case.
Member of the Development-TeamNo Support via PM

Post Reply