Note: We are moving the topics of this forum and it will be deleted at some point
Publish your own request for comments/change or patches for the next version of phpBB. Discuss the contributions and proposals of others. Upcoming releases are 3.2/Rhea and 3.3.
(this part is new and makes sense to point out in the discussion that's happening here)
JoshyPHP wrote:
If this parser represents the BBCode markup as a tree, just don't allow [*] to be a child of [*]. Instead, move the node (with its descendants and all of its siblings) as the next sibling(s) of its parents. [Edit: here's an illustration of the move, when parsing [lіst][*]foo[*]bar[/list]
For dealing with malformed markup please investigate web browsers and HTML parsing libraries of which there are many, for example hpricot/beautifulsoup/libxml2 html parser. People have been pondering this issue since I'm guessing around 1995.
A reasonably quick test should be to load said markup via the parser and examine resulting DOM tree.
There are plenty of cases of bad user-submitted markup. Whenever reasonably possible and if the intent is obvious, I'll try to fix it. If the intent is not obvious to me, I'll defer to the HTML rules if they're simple enough to implement. Otherwise, I'd try to go with whatever makes sense and is simple/consistent. In the case of <ul><li>foo<b><li>bar</b></ul>, an HTML5 parser would interpret it as <ul><li>foo<b></b></li><li><b>bar</b></li></ul> but the rules behind that are complicated enough that I'm not even sure which ones apply in this specific case.
If you're not willing to go into that kind of complexity, then I'd go for the next best thing, which is to find a simple, consistent way to resolve the situation. In this example, [*] is used inside of [b]; Since it shouldn't be there, I'd treat it the same way I'd treat it outside of a list and ignore it (render as plain text.) The HTML would end up as: <ul><li>foo<b>[*]bar</b></li></ul>.
JoshyPHP wrote:There are plenty of cases of bad user-submitted markup. Whenever reasonably possible and if the intent is obvious, I'll try to fix it. If the intent is not obvious to me, I'll defer to the HTML rules if they're simple enough to implement. Otherwise, I'd try to go with whatever makes sense and is simple/consistent. In the case of <ul><li>foo<b><li>bar</b></ul>, an HTML5 parser would interpret it as <ul><li>foo<b></b></li><li><b>bar</b></li></ul> but the rules behind that are complicated enough that I'm not even sure which ones apply in this specific case.
If you're not willing to go into that kind of complexity, then I'd go for the next best thing, which is to find a simple, consistent way to resolve the situation. In this example, [*] is used inside of [b]; Since it shouldn't be there, I'd treat it the same way I'd treat it outside of a list and ignore it (render as plain text.) The HTML would end up as: <ul><li>foo<b>[*]bar</b></li></ul>.
Personally, I think that we should just throw an error like "BBCode markup invalid -- [whatever] tag not closed". This is similar to what we do for the IMG tag if the image size is too big. It would also help prevent garbled posts from people who butcher quotes or accidentally end URL tags with U or similar things.
Don't try to guess what the user wants; make the user do it right.
I do not think an error would be a bad option. The SCEditor has shown to be able to correctly nest bbcodes when applying bbcodes over an area already formatted by two different styles--at least when it comes to the tests I've performed and with the basic set of bbcodes it has--so hopefully this issue will not occur from people using the editor (if it only occurs to people who manually enter the bbcodes, then they should be expected to be able to fix the codes if they are malformed).
I don't disagree with displaying an error. Giving the user feedback seems like a good idea indeed, but the parser still has to be able to make out something of the garbled post to handle automated processes such as converting old posts from an older version or from another forum software. I've been scouring the web (ok not really, I just have a Google alert) for examples of BBCodes for a little while now, and there's some heinous markup out there.
In an ideal world, it would go like this:
have an editor that never produces junk
if junk is found, alert the user
if there's no user, fasten your seat belt and prepare for a crash landing. IOW, transform the junk into something that won't break the page
JoshyPHP wrote:
In an ideal world, it would go like this:
have an editor that never produces junk
if junk is found, alert the user
if there's no user, fasten your seat belt and prepare for a crash landing. IOW, transform the junk into something that won't break the page
I agree with that 100%. Maybe the BBCode parser will take an option that specifies if a user entered the BBCode or not (for example, it's going through a converter). If a user entered it, flag the error; otherwise, try to emit something valid.
callumacrae wrote:Re the error, what happens when a user edits a post made before 3.1? Will they have to fix the markup before they can submit the edited post?
In my opinion, yes. It's no different than somebody creating a new post with invalid markup.