[RFC|Accepted] Updated BBcode engine

Note: We are moving the topics of this forum and it will be deleted at some point

Publish your own request for comments/change or patches for the next version of phpBB. Discuss the contributions and proposals of others. Upcoming releases are 3.2/Rhea and 3.3.
Post Reply
User avatar
JoshyPHP
Registered User
Posts: 360
Joined: Fri Jul 08, 2011 9:43 pm

Re: [RFC|Accepted] Updated BBcode engine

Post by JoshyPHP » Mon Oct 29, 2012 7:52 pm

imkingdavid wrote:IMO we should store a blacklist of tags that cannot be a given tag's child.
IMO that's the way to go for a robust implementation. Ideally it would be best to not only control what tags are allowed as children of a given tag, but also which tags are allowed as descendants. For instance, take [b] and [url]. In isolation, the combinations [url][b] and [b][url] are valid but strung together, [url][b][url] would produce invalid HTML. It won't break the page though, and off the top of my head I can't think of a combination that would.

User avatar
imkingdavid
Registered User
Posts: 1050
Joined: Thu Jul 30, 2009 12:06 pm

Re: [RFC|Accepted] Updated BBcode engine

Post by imkingdavid » Mon Oct 29, 2012 7:57 pm

JoshyPHP wrote:
imkingdavid wrote:IMO we should store a blacklist of tags that cannot be a given tag's child.
IMO that's the way to go for a robust implementation. Ideally it would be best to not only control what tags are allowed as children of a given tag, but also which tags are allowed as descendants. For instance, take [b] and [url]. In isolation, the combinations [url][b] and [b][url] are valid but strung together, [url][b][url] would produce invalid HTML. It won't break the page though, and off the top of my head I can't think of a combination that would.
And so to fix the [url][b][url] issue, we would simply blacklist [url] from being a descendant of itself. In fact, no tag needs to be a descendant of itself, unless I'm forgetting a plausible scenario in which it is needed.
I do custom MODs. PM for a quote!
View My: MODs | Portfolio
Please do NOT contact for support via PM or email.
Remember, the enemy's gate is down.

User avatar
brunoais
Registered User
Posts: 964
Joined: Fri Dec 18, 2009 3:55 pm

Re: [RFC|Accepted] Updated BBcode engine

Post by brunoais » Mon Oct 29, 2012 8:04 pm

@imkingdavid
I think you got the wrong idea. I was asking about the system I'm developing ATM. I'm talking about which rules it must follow.
I know how the current system works. I spent the equivalent of 1/2 a day dedicated studying how it does everything.
Personally, I don't like the [tag:uid]content[/tag:uid] format. I'm developing in a way that we may not need any of that stuff.
I'm trying to make a real tree out of the BBCodes just like the browser makes a tree out of the HTML.
So far so good, I think I got a good solution for the 1st pass.
It gives me just about everything I need to have the complete thing formatted as a tree.
Then, after I have it all in a tree-like structure, I just need to parse inwards to outwards.
That's the step I think it should stop.
Personally, I'd store a serialization of the complete post BBCode tree in the DB. What do you all think?
@EXreaction
BBCode reading permissions.
An extension may want to implement the idea that some users are able to view some BBCodes and other users are unable. Think like some forums that only allow members to view the url's posted.
JoshyPHP wrote:[url][b][url] would produce invalid HTML. It won't break the page though, and off the top of my head I can't think of a combination that would.
In that specific code, in my implementation, that would produce the output:

Code: Select all

[url][b][url]
If you made a mistake in your post and meant:

Code: Select all

[url][b][/url]
Then, if the tag is never closed, the output would be:

Code: Select all

<a href="[b]">[b]</a>
imkingdavid wrote: And so to fix the [url][b][url] issue, we would simply blacklist [url] from being a descendant of itself. In fact, no tag needs to be a descendant of itself, unless I'm forgetting a plausible scenario in which it is needed.
Each tag will have the direct descendent's blacklist and the full descendent's blacklist. Alternatively, it may have a whitelist. (Blacklist and whitelist are incompatible, you have to choose one of the 2, for each tag).
Last edited by brunoais on Wed Oct 31, 2012 10:03 pm, edited 1 time in total.

User avatar
JoshyPHP
Registered User
Posts: 360
Joined: Fri Jul 08, 2011 9:43 pm

Re: [RFC|Accepted] Updated BBcode engine

Post by JoshyPHP » Mon Oct 29, 2012 9:07 pm

imkingdavid wrote:And so to fix the [url][b][url] issue, we would simply blacklist [url] from being a descendant of itself.
Yeah, [url] would need to disallow itself and some other tags (anything interactive, such as videos) as descendants.
imkingdavid wrote:In fact, no tag needs to be a descendant of itself, unless I'm forgetting a plausible scenario in which it is needed.
Off the top of my head, [quоte] and [list] are nestable. Some custom BBCodes as well, such as [spoiler].
brunoais wrote:If you made a mistake in your post and meant: [...]
I meant a [url] tag with a [b] child with a [url] child of its own.

User avatar
Pony99CA
Registered User
Posts: 986
Joined: Sun Feb 08, 2009 2:35 am
Location: Hollister, CA
Contact:

Re: [RFC|Accepted] Updated BBcode engine

Post by Pony99CA » Tue Oct 30, 2012 12:35 am

JoshyPHP wrote:
imkingdavid wrote:In fact, no tag needs to be a descendant of itself, unless I'm forgetting a plausible scenario in which it is needed.
Off the top of my head, [quоte] and [list] are nestable. Some custom BBCodes as well, such as [spoiler].
Table tags should also be nestable. And, while some tags don't strictly need to be nestable, it's nice to allow them. For example:

Code: Select all

[color=red]Red followed by [color=blue]Blue[/color] ending with red[/color]
is easier to write than

Code: Select all

[color=red]Red followed by [/color][color=blue]Blue[/color][color=red] ending with red[/color]
But the former yields junk:

Red followed by Blue ending with red

while the latter works as intended:

Red followed by Blue ending with red

In HTML, however, there's no problem nesting SPAN tags to change colors.

Steve
Silicon Valley Pocket PC (http://www.svpocketpc.com)
Creator of manage_bots and spoof_user (ask me)
Need hosting for a small forum with full cPanel & MySQL access? Contact me or PM me.

User avatar
brunoais
Registered User
Posts: 964
Joined: Fri Dec 18, 2009 3:55 pm

Re: [RFC|Accepted] Updated BBcode engine

Post by brunoais » Tue Oct 30, 2012 8:38 am

Pony99CA wrote:

Code: Select all

[color=red]Red followed by [color=blue]Blue[/color] ending with red[/color]
is easier to write than

Code: Select all

[color=red]Red followed by [/color][color=blue]Blue[/color][color=red] ending with red[/color]
But the former yields junk:

Red followed by Blue ending with red
In what I'm currently doing, That code:

Code: Select all

[color=red]Red followed by [/color][color=blue]Blue[/color][color=red] ending with red[/color]
Is valid and properly parsed.
My idea is to treat BBCode the same way the browser treats HTML, so all becomes a tree.

BTW, an update. For now, the algorithm is predicted to be O(2n) and o(n) in processing power and O(2n+?(*1)) in memory.
I'll see if I can continue to keep it like that.

I think I'll try to optimize to the processing assuming that all BBCode is properly nested and then fallback to some sort of recovery mode when it is not directly translatable into a tree and there's a need to decide which nodes needs to be ignored and which nodes are going to be parsed.
If I keep the idea that the only thing it guarantees if the BBCode is improperly nested, is that the output is properly nested HTML, I think I can make a significantly faster system than the current one. I don't know about the memory used yet, though... In the end, I may end up by using more memory.

(*1) The "?" is the memory used by the arrays, which I don't know how cheap it is.

User avatar
brunoais
Registered User
Posts: 964
Joined: Fri Dec 18, 2009 3:55 pm

Re: [RFC|Accepted] Updated BBcode engine

Post by brunoais » Wed Oct 31, 2012 8:18 pm

Made a new commit.
It now matches the opening tags with the closing tags. If there are too many opening or closing tags, the extra ones are treated as text.

BTW, Can BBCode tags be case-sensitive? It would help a lot parsing this.

User avatar
brunoais
Registered User
Posts: 964
Joined: Fri Dec 18, 2009 3:55 pm

Re: [RFC|Accepted] Updated BBcode engine

Post by brunoais » Sun Nov 04, 2012 11:09 am

A new, very important commit was just made.
With this commit, I now consider the string parsing stage complete (but not bug free).
Please download this standalone file into your server:
https://github.com/brunoais/phpbb3/blob ... ngTest.php

Then, try altering the "$string" variable and the "$BBCode_tags" variable (it's right after the $string variable; you may need to scroll down a page or two).

Notes:
The $BBCodes_tags variable is an array of string with the BBCodes you want to be found. Check the code as it is now for an example.
This part of the conversion is still in alpha stage. I've only made some preliminary tests, if you can come up with something interestingly complex, It would be useful.
The only output it gives, for now, is a var_dump() of the tree it created. For now, the original string is only read once and then left alone.

How to read the output of the var_dump() (keys vs values):

Code: Select all

["start_tag"]	What appears here refers to the opening tag.
["end_tag"]	What appears here refers to the closing tag.
['start_position']The character number where the tag starts (either the opening or the closing tag)
['end_position']	The character number where the tag ends (either the opening or the closing tag)

["children"] An array of tags that are child tags against this tag. Notice that the values between ["start_tag"]['start_position'] of all children are bigger than ["start_tag"]['start_position'] of its parent. Same happens for the ["end_tag"]['end_position']. If this does not happen for any of the tags, then it's a bug
What do you think of this approach to solve the BBCode parsing problem? Sounds to be a better approach than calling, at least, once preg_match_all() for each match you want to try to get. If the input tags follow the same rules as they currently follow, preg_match_all() is called only 1ce.

I'm not quite sure if I was clear enough but just warn me if I wasn't

User avatar
brunoais
Registered User
Posts: 964
Joined: Fri Dec 18, 2009 3:55 pm

Re: [RFC|Accepted] Updated BBcode engine

Post by brunoais » Thu Nov 22, 2012 5:19 pm

Update:

The main thing that constitutes just a BBCode Parser is now complete (but not properly tested) check it here:
https://github.com/brunoais/phpbb3/blob ... ngTest.php

Now, for the news:
The system can now do the following:
  • Find tags in the document
  • Associate a starting tag to an end tag
  • Build a tree out of the tags in the text
  • Send the text and the parameters for parsing by its corresponding parser.
Try to alter the code at the BBCode setup part and the string and see what happens.
(variables are: $string and $BBCode_tags).
Just use what it has now as an example.
Also checkout the ExampleParser class for an explanation and the BoldParser, ItalicParser and UnderlineParser classes for an example of really very simple parsing.

With what it does now, you can already make many known BBCode tags including:

Code: Select all

[quote], [url], [b], [img], [color], [spoiler], [size]
.
But it still not correctly support tags like:

Code: Select all

[list], [code], [li]/[*] 
.

Try it and tell me what you think.

The output is as follows:
1st, the original (input) string.
2nd, the parsed (output) string

3rd-6th, The time when it arrived to the parsing stage
8th-10th, How long did it take in each section of the BBCode parsing.

11th, The total time it took to finish the parsing of the string.

System's Bottleck
Let me remind you that the only known bottleneck (in time) of this algorithm happens with tags with the same name nested inside itself. For example:

Code: Select all

[b][b][b][b]......[/b][/b][/b][/b]
Is slower than:

Code: Select all

[b][i][u][c]......[/c][/u][/i][/b]

But it's not computer noticeable (in ms time) until you reach to the 7th.... 10th... element inside itself. If you want more details about this, just poke me.

leschek
Registered User
Posts: 163
Joined: Tue Aug 28, 2012 1:30 pm

Re: [RFC|Accepted] Updated BBcode engine

Post by leschek » Thu Nov 29, 2012 9:06 am

I'm not sure, if I'm in right topic, but I would like to ask a few questions about BBCodes.
1. Would be possible to use Post ID (something like here), when creating custom BBCode?
2 .Will we (users) be able to use template conditions (IFs) when creating custom BBCode?
3 .Would be possible to add to Quote and Code BBCode option to expand and collapse the box?

Post Reply