[RFC|Accepted] Updated BBcode engine

callumacrae · Post by **callumacrae** » Wed Dec 12, 2012 6:31 pm

quote
this
message

Making this change would require... zero changes.

brunoais · Post by **brunoais** » Wed Dec 12, 2012 6:35 pm

callumacrae wrote:
quote

this

message
Making this change would require... zero changes.

The problem is that it accepts both:

[list][*]quote[/*]
[*]this[/*]
[*]message[/*][/list]

and

Code: Select all

[list][*]quote
[*]this
[*]message[/list]

So...
If phpBB uses the [*] for opening and [/*] for closing, then all ok. The problem is that I don't think that that will happen that frequently, or will it?

Well, one idea so far is to use a preparser. What it will do is to scan the string like this:

Code: Select all

Check and look for "[/*]" in the string
If no "[/*]" are found
	Locate and replace "[*]" by "[/*][*]"
	Locate and replace "[/list]" by "[/*][/list]"

Unfortunately, something like this will break the original message. This means that the original message needs to be stored separately, somehow.

Post by **DavidIQ** » Wed Dec 12, 2012 8:04 pm

I don't understand what the problem is. This currently works just fine with either [*] or [*][/*]. What is wrong with how it works right now?

imkingdavid · Post by **imkingdavid** » Wed Dec 12, 2012 8:08 pm

David is right,

Code: Select all

[list][*]test[/*][/list]

works just like

Code: Select all

[list][*]test[/list]

brunoais · Post by **brunoais** » Wed Dec 12, 2012 8:22 pm

works with either [*] and [*][/*] and not just with [*][/*]

EXreaction · Post by **EXreaction** » Wed Dec 12, 2012 8:30 pm

DavidIQ wrote:I don't understand what the problem is. This currently works just fine with either [*] or [*][/*]. What is wrong with how it works right now?

The problem is when designing a new parser to be more efficient and more valid. If bbcode parsing is treated more like a tree or valid XML, [*] will not work without hacks for this specific case, which is not particularly desired.

JoshyPHP · Post by **JoshyPHP** » Wed Dec 12, 2012 8:55 pm

It doesn't have to be a hack at all. Optional end tags are part of HTML 5 and they're not limited to <li> so if you want to produce valid HTML you'd have to handle them appropriately. For instance, [p]a[p]b[/p]c[/p] might seem legal for a BBCode parser, but abc is not valid HTML and it will be interpreted as abc. I'm not saying that nesting paragraphs is particularly important, but at any rate optional end tags are not an edge case or a hack. (Void elements such as <hr> are comparable in the sense that they don't use end tags, and a few people are using custom [hr] BBCodes in their forums)

If this parser represents the BBCode markup as a tree, just don't allow [*] to be a child of [*]. Instead, move the node (with its descendants and all of its siblings) as the next sibling(s) of its parents. [Edit: here's an illustration of the move, when parsing [lіst][*]foo[*]bar[/list]]

Code: Select all

[list]   ->   [list]
   |           /   \
  [*]         [*] [*]
 /   \         |   |
foo [*]       foo bar
     |
    bar

brunoais · Post by **brunoais** » Wed Dec 12, 2012 9:43 pm

JoshyPHP wrote:It doesn't have to be a hack at all. Optional end tags are part of HTML 5 and they're not limited to <li> so if you want to produce valid HTML you'd have to handle them appropriately.

AFAIK, I need to produce valid xHTML5. The main reason is if phpBB changes to xHTML.

JoshyPHP wrote:For instance, [p]a[p]b[/p]c[/p] might seem legal for a BBCode parser, but abc is not valid HTML and it will be interpreted as abc.

Yeah... closing a tag is optional in HTML, but not in xHTML.
 tags can only contain flow tags inside. This means, p tags cannot be nested.
the result of:
[p]a[p]b[/p]c[/p]
is:
a[p]b[/p]c
because p BBCode tags cannot be nested

.

JoshyPHP wrote: I'm not saying that nesting paragraphs is particularly important, but at any rate optional end tags are not an edge case or a hack. (Void elements such as <hr> are similar, and a few people are using custom [hr] BBCodes in their forums)

Have you heard about... say eh... self-closing tags?

JoshyPHP wrote: If this parser represents the BBCode markup as a tree, just don't allow [*] to be a child of [*]. Instead, move the node (with its descendants and all of its siblings) at the next sibling of its parents.

If you parse the same way for all tags then differentiate only in the details, then you'll see that it's not that straightforward as it seems when you write that.

JoshyPHP · Post by **JoshyPHP** » Wed Dec 12, 2012 11:01 pm

Can't you render the parsed message as XHTML without parsing it as such? Frankly I'd rather not require the end users to balance their BBCode tags, especially when the current implementation doesn't. It seems like an unnecessary loss.

I don't understand what you meant in your last sentence. If you use a tree structure, I'd think that moving nodes around would be relatively straightforward but since I haven't really looked into your code I wouldn't know.

brunoais · Post by **brunoais** » Wed Dec 12, 2012 11:39 pm

JoshyPHP wrote:Can't you render the parsed message as XHTML without parsing it as such? Frankly I'd rather not require the end users to balance their BBCode tags, especially when the current implementation doesn't. It seems like an unnecessary loss.

Ofc I could try to work with it in a more loose way, but for that I need to know how to specifically work with it.
Anyway, any change to the original string will generate more processing overhead to the parser.
Doing these kinds of fixes requires creating zero-lengh tags which is not a problem, per se. The problem is knowing where to place those zero-length tags (I remind that my parser only reads the input string once and that's an advantage!).
Also, for the main part of the system, there are only 2 "classes" of tags. Self-closing and non self-closing, which are parsed almost apart because the self-closing ones are easier to parse (no nesting issues and no malforming exists).

JoshyPHP wrote: I don't understand what you meant in your last sentence. If you use a tree structure, I'd think that moving nodes around would be relatively straightforward but since I haven't really looked into your code I wouldn't know.

The concept of moving the nodes is straight forward from the algorithmic POV. But not that straight forward from the performance POV.
Anyway, as soon as I know exactly which rules it needs to solve, then I'll see how can I do it. And I'll always try to "prize" the "XML" confirming BBCode nesting as the fastest way to parse and the other as a "recovery from failure". Because I don't work in the string itself it should stay fast, anyway.

Development Discussion Board

[RFC|Accepted] Updated BBcode engine

Re: [RFC|Accepted] Updated BBcode engine

Re: [RFC|Accepted] Updated BBcode engine

Re: [RFC|Accepted] Updated BBcode engine

Re: [RFC|Accepted] Updated BBcode engine

Re: [RFC|Accepted] Updated BBcode engine

Re: [RFC|Accepted] Updated BBcode engine

Re: [RFC|Accepted] Updated BBcode engine

Re: [RFC|Accepted] Updated BBcode engine

Re: [RFC|Accepted] Updated BBcode engine

Re: [RFC|Accepted] Updated BBcode engine