[RFC|Accepted] Updated BBcode engine

Note: We are moving the topics of this forum and it will be deleted at some point

Publish your own request for comments/change or patches for the next version of phpBB. Discuss the contributions and proposals of others. Upcoming releases are 3.2/Rhea and 3.3.
Post Reply
User avatar
callumacrae
Former Team Member
Posts: 1046
Joined: Tue Apr 27, 2010 9:37 am
Location: England
Contact:

Re: [RFC|Accepted] Updated BBcode engine

Post by callumacrae »

  • quote
  • this
  • message
Making this change would require... zero changes.
Made by developers, for developers!
My blog

User avatar
brunoais
Registered User
Posts: 964
Joined: Fri Dec 18, 2009 3:55 pm

Re: [RFC|Accepted] Updated BBcode engine

Post by brunoais »

callumacrae wrote:
  • quote
  • this
  • message
Making this change would require... zero changes.
The problem is that it accepts both:

Code: Select all

[list][*]quote[/*]
[*]this[/*]
[*]message[/*][/list]
and

Code: Select all

[list][*]quote
[*]this
[*]message[/list]
So...
If phpBB uses the [*] for opening and [/*] for closing, then all ok. The problem is that I don't think that that will happen that frequently, or will it?

Well, one idea so far is to use a preparser. What it will do is to scan the string like this:

Code: Select all

Check and look for "[/*]" in the string
If no "[/*]" are found
	Locate and replace "[*]" by "[/*][*]"
	Locate and replace "[/list]" by "[/*][/list]"
Unfortunately, something like this will break the original message. This means that the original message needs to be stored separately, somehow.

User avatar
DavidIQ
Customisations Team Leader
Customisations Team Leader
Posts: 1904
Joined: Thu Mar 02, 2006 4:29 pm
Location: Earth
Contact:

Re: [RFC|Accepted] Updated BBcode engine

Post by DavidIQ »

I don't understand what the problem is. This currently works just fine with either [*] or [*][/*]. What is wrong with how it works right now?
Image

User avatar
imkingdavid
Registered User
Posts: 1050
Joined: Thu Jul 30, 2009 12:06 pm

Re: [RFC|Accepted] Updated BBcode engine

Post by imkingdavid »

David is right,

Code: Select all

[list][*]test[/*][/list]
works just like

Code: Select all

[list][*]test[/list]
I do custom MODs. PM for a quote!
View My: MODs | Portfolio
Please do NOT contact for support via PM or email.
Remember, the enemy's gate is down.

User avatar
brunoais
Registered User
Posts: 964
Joined: Fri Dec 18, 2009 3:55 pm

Re: [RFC|Accepted] Updated BBcode engine

Post by brunoais »

works with either [*] and [*][/*] and not just with [*][/*]

User avatar
EXreaction
Registered User
Posts: 1555
Joined: Sat Sep 10, 2005 2:15 am

Re: [RFC|Accepted] Updated BBcode engine

Post by EXreaction »

DavidIQ wrote:I don't understand what the problem is. This currently works just fine with either [*] or [*][/*]. What is wrong with how it works right now?
The problem is when designing a new parser to be more efficient and more valid. If bbcode parsing is treated more like a tree or valid XML, [*] will not work without hacks for this specific case, which is not particularly desired.

User avatar
JoshyPHP
Registered User
Posts: 381
Joined: Fri Jul 08, 2011 9:43 pm

Re: [RFC|Accepted] Updated BBcode engine

Post by JoshyPHP »

It doesn't have to be a hack at all. Optional end tags are part of HTML 5 and they're not limited to <li> so if you want to produce valid HTML you'd have to handle them appropriately. For instance, [p]a[p]b[/p]c[/p] might seem legal for a BBCode parser, but <p>a<p>b</p>c</p> is not valid HTML and it will be interpreted as <p>a</p><p>b</p>c<p></p>. I'm not saying that nesting paragraphs is particularly important, but at any rate optional end tags are not an edge case or a hack. (Void elements such as <hr> are comparable in the sense that they don't use end tags, and a few people are using custom [hr] BBCodes in their forums)

If this parser represents the BBCode markup as a tree, just don't allow [*] to be a child of [*]. Instead, move the node (with its descendants and all of its siblings) as the next sibling(s) of its parents. [Edit: here's an illustration of the move, when parsing [lіst][*]foo[*]bar[/list]]

Code: Select all

[list]   ->   [list]
   |           /   \
  [*]         [*] [*]
 /   \         |   |
foo [*]       foo bar
     |
    bar
Last edited by JoshyPHP on Wed Dec 12, 2012 10:54 pm, edited 1 time in total.

User avatar
brunoais
Registered User
Posts: 964
Joined: Fri Dec 18, 2009 3:55 pm

Re: [RFC|Accepted] Updated BBcode engine

Post by brunoais »

JoshyPHP wrote:It doesn't have to be a hack at all. Optional end tags are part of HTML 5 and they're not limited to <li> so if you want to produce valid HTML you'd have to handle them appropriately.
AFAIK, I need to produce valid xHTML5. The main reason is if phpBB changes to xHTML.
JoshyPHP wrote:For instance, [p]a[p]b[/p]c[/p] might seem legal for a BBCode parser, but <p>a<p>b</p>c</p> is not valid HTML and it will be interpreted as <p>a</p><p>b</p>c<p></p>.
Yeah... closing a <p> tag is optional in HTML, but not in xHTML.
<p> tags can only contain flow tags inside. This means, p tags cannot be nested.
the result of:
[p]a[p]b[/p]c[/p]
is:
<p>a[p]b[/p]c</p>
because p BBCode tags cannot be nested :).
JoshyPHP wrote: I'm not saying that nesting paragraphs is particularly important, but at any rate optional end tags are not an edge case or a hack. (Void elements such as <hr> are similar, and a few people are using custom [hr] BBCodes in their forums)
Have you heard about... say eh... self-closing tags? ;)
JoshyPHP wrote: If this parser represents the BBCode markup as a tree, just don't allow [*] to be a child of [*]. Instead, move the node (with its descendants and all of its siblings) at the next sibling of its parents.
If you parse the same way for all tags then differentiate only in the details, then you'll see that it's not that straightforward as it seems when you write that.

User avatar
JoshyPHP
Registered User
Posts: 381
Joined: Fri Jul 08, 2011 9:43 pm

Re: [RFC|Accepted] Updated BBcode engine

Post by JoshyPHP »

Can't you render the parsed message as XHTML without parsing it as such? Frankly I'd rather not require the end users to balance their BBCode tags, especially when the current implementation doesn't. It seems like an unnecessary loss.

I don't understand what you meant in your last sentence. If you use a tree structure, I'd think that moving nodes around would be relatively straightforward but since I haven't really looked into your code I wouldn't know.

User avatar
brunoais
Registered User
Posts: 964
Joined: Fri Dec 18, 2009 3:55 pm

Re: [RFC|Accepted] Updated BBcode engine

Post by brunoais »

JoshyPHP wrote:Can't you render the parsed message as XHTML without parsing it as such? Frankly I'd rather not require the end users to balance their BBCode tags, especially when the current implementation doesn't. It seems like an unnecessary loss.
Ofc I could try to work with it in a more loose way, but for that I need to know how to specifically work with it.
Anyway, any change to the original string will generate more processing overhead to the parser.
Doing these kinds of fixes requires creating zero-lengh tags which is not a problem, per se. The problem is knowing where to place those zero-length tags (I remind that my parser only reads the input string once and that's an advantage!).
Also, for the main part of the system, there are only 2 "classes" of tags. Self-closing and non self-closing, which are parsed almost apart because the self-closing ones are easier to parse (no nesting issues and no malforming exists).
JoshyPHP wrote: I don't understand what you meant in your last sentence. If you use a tree structure, I'd think that moving nodes around would be relatively straightforward but since I haven't really looked into your code I wouldn't know.
The concept of moving the nodes is straight forward from the algorithmic POV. But not that straight forward from the performance POV.
Anyway, as soon as I know exactly which rules it needs to solve, then I'll see how can I do it. And I'll always try to "prize" the "XML" confirming BBCode nesting as the fastest way to parse and the other as a "recovery from failure". Because I don't work in the string itself it should stay fast, anyway.

Post Reply