[RFC|Accepted] Updated BBcode engine

Note: We are moving the topics of this forum and it will be deleted at some point

Publish your own request for comments/change or patches for the next version of phpBB. Discuss the contributions and proposals of others. Upcoming releases are 3.2/Rhea and 3.3.
Post Reply
User avatar
brunoais
Registered User
Posts: 964
Joined: Fri Dec 18, 2009 3:55 pm

Re: [RFC|Accepted] Updated BBcode engine

Post by brunoais »

(this part is new and makes sense to point out in the discussion that's happening here)
JoshyPHP wrote: If this parser represents the BBCode markup as a tree, just don't allow [*] to be a child of [*]. Instead, move the node (with its descendants and all of its siblings) as the next sibling(s) of its parents. [Edit: here's an illustration of the move, when parsing [lіst][*]foo[*]bar[/list]

Code: Select all

[list]   ->   [list]
   |           /   \
  [*]         [*] [*]
 /   \         |   |
foo [*]       foo bar
     |
    bar
How about this input?

Code: Select all

[lіst][*]foo[b][*]bar[/b][/list]

Code: Select all

[list]   ->   [list]
   |           /    \
  [*]         [*]  [b]
 /   \       /  \  /  \
foo [b]   foo [b] [*] bar
     |
    [*]
     |
    bar
Some1 has to die... Who dies? [ b] or the second [*]?
Last edited by EXreaction on Thu Dec 13, 2012 5:10 pm, edited 1 time in total.
Reason: Use code rather than inline [c] (bbcodes are parsed within [c])

Oleg
Posts: 1150
Joined: Tue Feb 23, 2010 2:38 am
Contact:

Re: [RFC|Accepted] Updated BBcode engine

Post by Oleg »

For dealing with malformed markup please investigate web browsers and HTML parsing libraries of which there are many, for example hpricot/beautifulsoup/libxml2 html parser. People have been pondering this issue since I'm guessing around 1995.

A reasonably quick test should be to load said markup via the parser and examine resulting DOM tree.

User avatar
brunoais
Registered User
Posts: 964
Joined: Fri Dec 18, 2009 3:55 pm

Re: [RFC|Accepted] Updated BBcode engine

Post by brunoais »

Oleg wrote:A reasonably quick test should be to load said markup via the parser and examine resulting DOM tree.
Trial and error?

User avatar
JoshyPHP
Registered User
Posts: 381
Joined: Fri Jul 08, 2011 9:43 pm

Re: [RFC|Accepted] Updated BBcode engine

Post by JoshyPHP »

There are plenty of cases of bad user-submitted markup. Whenever reasonably possible and if the intent is obvious, I'll try to fix it. If the intent is not obvious to me, I'll defer to the HTML rules if they're simple enough to implement. Otherwise, I'd try to go with whatever makes sense and is simple/consistent. In the case of <ul><li>foo<b><li>bar</b></ul>, an HTML5 parser would interpret it as <ul><li>foo<b></b></li><li><b>bar</b></li></ul> but the rules behind that are complicated enough that I'm not even sure which ones apply in this specific case.

If you're not willing to go into that kind of complexity, then I'd go for the next best thing, which is to find a simple, consistent way to resolve the situation. In this example, [*] is used inside of [b]; Since it shouldn't be there, I'd treat it the same way I'd treat it outside of a list and ignore it (render as plain text.) The HTML would end up as: <ul><li>foo<b>[*]bar</b></li></ul>.

User avatar
callumacrae
Former Team Member
Posts: 1046
Joined: Tue Apr 27, 2010 9:37 am
Location: England
Contact:

Re: [RFC|Accepted] Updated BBcode engine

Post by callumacrae »

xHTML5 is a thing?

The output only needs to be valid HTML5.
Made by developers, for developers!
My blog

User avatar
Pony99CA
Registered User
Posts: 986
Joined: Sun Feb 08, 2009 2:35 am
Location: Hollister, CA
Contact:

Re: [RFC|Accepted] Updated BBcode engine

Post by Pony99CA »

JoshyPHP wrote:There are plenty of cases of bad user-submitted markup. Whenever reasonably possible and if the intent is obvious, I'll try to fix it. If the intent is not obvious to me, I'll defer to the HTML rules if they're simple enough to implement. Otherwise, I'd try to go with whatever makes sense and is simple/consistent. In the case of <ul><li>foo<b><li>bar</b></ul>, an HTML5 parser would interpret it as <ul><li>foo<b></b></li><li><b>bar</b></li></ul> but the rules behind that are complicated enough that I'm not even sure which ones apply in this specific case.

If you're not willing to go into that kind of complexity, then I'd go for the next best thing, which is to find a simple, consistent way to resolve the situation. In this example, [*] is used inside of [b]; Since it shouldn't be there, I'd treat it the same way I'd treat it outside of a list and ignore it (render as plain text.) The HTML would end up as: <ul><li>foo<b>[*]bar</b></li></ul>.
Personally, I think that we should just throw an error like "BBCode markup invalid -- [whatever] tag not closed". This is similar to what we do for the IMG tag if the image size is too big. It would also help prevent garbled posts from people who butcher quotes or accidentally end URL tags with U or similar things.

Don't try to guess what the user wants; make the user do it right.

Steve
Silicon Valley Pocket PC (http://www.svpocketpc.com)
Creator of manage_bots and spoof_user (ask me)
Need hosting for a small forum with full cPanel & MySQL access? Contact me or PM me.

User avatar
EXreaction
Registered User
Posts: 1555
Joined: Sat Sep 10, 2005 2:15 am

Re: [RFC|Accepted] Updated BBcode engine

Post by EXreaction »

I do not think an error would be a bad option. The SCEditor has shown to be able to correctly nest bbcodes when applying bbcodes over an area already formatted by two different styles--at least when it comes to the tests I've performed and with the basic set of bbcodes it has--so hopefully this issue will not occur from people using the editor (if it only occurs to people who manually enter the bbcodes, then they should be expected to be able to fix the codes if they are malformed).

User avatar
JoshyPHP
Registered User
Posts: 381
Joined: Fri Jul 08, 2011 9:43 pm

Re: [RFC|Accepted] Updated BBcode engine

Post by JoshyPHP »

I don't disagree with displaying an error. Giving the user feedback seems like a good idea indeed, but the parser still has to be able to make out something of the garbled post to handle automated processes such as converting old posts from an older version or from another forum software. I've been scouring the web (ok not really, I just have a Google alert) for examples of BBCodes for a little while now, and there's some heinous markup out there. :lol:

In an ideal world, it would go like this:
  • have an editor that never produces junk
  • if junk is found, alert the user
  • if there's no user, fasten your seat belt and prepare for a crash landing. IOW, transform the junk into something that won't break the page

User avatar
callumacrae
Former Team Member
Posts: 1046
Joined: Tue Apr 27, 2010 9:37 am
Location: England
Contact:

Re: [RFC|Accepted] Updated BBcode engine

Post by callumacrae »

Re the error, what happens when a user edits a post made before 3.1? Will they have to fix the markup before they can submit the edited post?
Made by developers, for developers!
My blog

User avatar
Pony99CA
Registered User
Posts: 986
Joined: Sun Feb 08, 2009 2:35 am
Location: Hollister, CA
Contact:

Re: [RFC|Accepted] Updated BBcode engine

Post by Pony99CA »

JoshyPHP wrote: In an ideal world, it would go like this:
  • have an editor that never produces junk
  • if junk is found, alert the user
  • if there's no user, fasten your seat belt and prepare for a crash landing. IOW, transform the junk into something that won't break the page
I agree with that 100%. Maybe the BBCode parser will take an option that specifies if a user entered the BBCode or not (for example, it's going through a converter). If a user entered it, flag the error; otherwise, try to emit something valid.
callumacrae wrote:Re the error, what happens when a user edits a post made before 3.1? Will they have to fix the markup before they can submit the edited post?
In my opinion, yes. It's no different than somebody creating a new post with invalid markup.

Steve
Silicon Valley Pocket PC (http://www.svpocketpc.com)
Creator of manage_bots and spoof_user (ask me)
Need hosting for a small forum with full cPanel & MySQL access? Contact me or PM me.

Post Reply