[RFC|Accepted] Updated BBcode engine

Note: We are moving the topics of this forum and it will be deleted at some point

Publish your own request for comments/change or patches for the next version of phpBB. Discuss the contributions and proposals of others. Upcoming releases are 3.2/Rhea and 3.3.
Post Reply
User avatar
brunoais
Registered User
Posts: 964
Joined: Fri Dec 18, 2009 3:55 pm

Re: [RFC|Accepted] Updated BBcode engine

Post by brunoais »

Seems like the topic OP (TerraFrost) has abandoned the project without warning.
I'll pick this1 up, then.
I've already checked the code and it sounds like the whole idea behind this is not bad at all but still I believe that if it were made in a tree like parsing, just like hot (x)HTML is parsed would be the most logical way to go to make it more flexible. Any help is also quite welcome, just write on this topic or send me a PM, if it's more personal.

I'll make these changes taking into account the RFC BBCode permissions and moving to all custom specially the "moving to all custom" part.

This will take quite a while so I have serious doubts it will go into 3.1 but with the new speeder release cycle this should be added fairly quickly.
If some1 has any ideas about the interface for the custom BBcodes, please do help me: Photoshop, paint, or similar your image of what would be a nice interface or even better(!), make the HTML + CSS for the ACP module.

Note: Due to personal life issues, I won't have any time at all to dedicate to this for the next two weeks (15 days). I'll start working on this after that whenever I have the time.

User avatar
brunoais
Registered User
Posts: 964
Joined: Fri Dec 18, 2009 3:55 pm

Re: [RFC|Accepted] Updated BBcode engine

Post by brunoais »

I have bad news for everyone.
My real life at uni is taking over all my time, really.
Currently, I only have very little time to myself and I'm using it to visit this forums and stay updated with the programming news, leaving nothing left to tackle this problem.
Anyone is free to to pickup this problem and solve it. If noone picks it up, I'll personally pick it up and solve it in my holidays, the time when I have plenty of time.
If someone picks it up, please tell that in this topic so that we all know.

User avatar
JoshyPHP
Registered User
Posts: 381
Joined: Fri Jul 08, 2011 9:43 pm

Re: [RFC|Accepted] Updated BBcode engine

Post by JoshyPHP »

Could a developer chime in on the desirability of changing the BBCode engine please? In other words, how badly is it needed, if at all? I see that this RFC's ticket is blocking a few issues on JIRA, but nothing big.

I think it would be super awesome if somebody could summarize this thread into a new RFC that defines specific goals and benchmarks the engine would have to meet (in addition to producing structurally-valid code and being tested.) There are a few ideas and recommendations scattered in this thread, it would be more practical to gather them into a new RFC or a new post.
TerraFrost wrote:Should the attempt be made to port phpBB3-style BBcodes to this new BBcode format?
With custom BBCodes being promoted to their own section in the customisation database (which btw is a pretty nice move), there's never been a greater incentive to keep this feature/style.
Last edited by JoshyPHP on Fri Oct 26, 2012 9:51 am, edited 1 time in total.

Oleg
Posts: 1150
Joined: Tue Feb 23, 2010 2:38 am
Contact:

Re: [RFC|Accepted] Updated BBcode engine

Post by Oleg »

From first post:
The feature/ascraeus-experiment branch has, among other things, a rewritten BBcode parser that protects against structurally invalid BBcodes.

It's also fairly easy to define custom BBcodes that can render in structurally invalid ways.
This RFC is not blocking 3.1.

If you are looking for something to do please consider instead finishing one of many currently outstanding PRs that require fixing and have not been touched in months.

User avatar
JoshyPHP
Registered User
Posts: 381
Joined: Fri Jul 08, 2011 9:43 pm

Re: [RFC|Accepted] Updated BBcode engine

Post by JoshyPHP »

Thanks for the speedy reply. I should mention that I did not land in this topic because I was cruising forums for something to do. I have a Google alert on "bbcode" (mainly because I'm keeping tabs on what users do or want to do with them) which regularly brings me back here, and that's how I ended up following this discussion.

So to expand on your post, the only goal for that engine would be to protect against structurally invalid BBcodes, which is considered a low priority.

User avatar
brunoais
Registered User
Posts: 964
Joined: Fri Dec 18, 2009 3:55 pm

Re: [RFC|Accepted] Updated BBcode engine

Post by brunoais »

JoshyPHP wrote:So to expand on your post, the only goal for that engine would be to protect against structurally invalid BBcodes, which is considered a low priority.
Not only that, but also fix the huge bugs with the current BBCode system, like, in most cases (interestingly, not always), it would parse the outside BBCode but not the inner BBCode if both are custom BBCodes.


BTW, For me, or for the one that picks this one up: Any rules we should follow for the invalid BBCode constructions (except it wouldn't ruin the final page layout)?

User avatar
brunoais
Registered User
Posts: 964
Joined: Fri Dec 18, 2009 3:55 pm

Re: [RFC|Accepted] Updated BBcode engine

Post by brunoais »

hi guys! It's me... again!

Due to unexpected free time I got, I'm starting to make some basic checks, tests and whatnot on how to parse BBCode into HTML.

I'll try to sum what's made and what's on my mind to do in the simplest way possible while trying not to omit what's important here.
I'll place footnotes about details of some assumptions I made that may be misinterpreted or forgotten so that everything becomes more complete.

I have tried some approaches and failed many for being too complex in both code and processing required.
The way the current approach is going to, the BBCode parser will follow the following rules:

The good stuff it guarantees:
  1. Only registered BBCode tags are bothered. The rest is completely ignored, and treated as text.
  2. If a closing tag appears before its corresponding opening tag (for that BBCode), those closing tags are treated as text.
  3. The output is always correctly nested HTML (*1).
The questionable stuff it guarantees:
  1. If the BBCode it receives as input is not properly nested, it's nearly impossible to predict which BBCode tags that will be parsed and which ones will be ignored. Still, I can give this guarantee:
    There's a near 100% chance that, if a preview with input "y" is made and it gives the output "x", (*2) every time that there's the input y the submitted post will be "x".
  2. There's no intermediate state of conversion like the current system has. There's the parsed version and the unparsed version. Making an intermediate state will require extra processing. If wanted, that can be arranged. (update) It now has.
  3. The parsed and the unparsed post will need to co-exist in the database to keep things fast and efficient. Not needed anymore.
Simple legend of both groups:
"The good stuff it guarantees" Are the rules I think that is truly positive for our goal with a BBCode parser.
"The questionable stuff it guarantees" Are the rules what I think that those may create results unwanted for the phpBB itself.


I also have questions about what it needs to guarantee.
  1. If a BBCode is deleted in the ACP, does the parsed version of that BBCode need to disappear from all posts?
  2. If a BBCode is edited, does it only need to apply the edited posts or does it also need to apply for all posts in the forum?
  3. Do we need to have a parsing checkpoint of some sort? I mean... Do we need to have some string that looks like the current system does before storing in the database or... Can I just create code that stores an index about where each BBCode tag is?
  4. I suppose there will be requirements about which tags can exist inside other tags. Example: In the current system, the [* ] tag can only exist inside the [list ] tag.
    How should it be handled? Ignore the tag that is outside, ignore the tag that is inside, or ignore the tag that is outside including all that is inside it (these three are quite easy to code and for php to process) or I should remove the one inside the tag instead?
  5. If the input is improperly nested BBCode, no effort will be made to nest the BBCode tags properly.
I already have thought solutions for all of them (except the 5th one), they just require some work a a little bit more of storage space, so I need to know if I require to guarantee 'em.

You can keep checking the updated versions of my tests here:
https://github.com/brunoais/phpbb3/blob ... ngTest.php

Please, no pull requests, yet.

(*1) As you may guess, It's not the BBCode parser's job to check if the replacement HTML given is probably nested. I cannot guarantee proper HTML output if the HTML of the replacement is not properly nested.
(*2) If no obvious action that would create a change in this is made, for example, editing the BBCodes in the ACP.
Last edited by brunoais on Thu Dec 13, 2012 1:23 pm, edited 2 times in total.

Oleg
Posts: 1150
Joined: Tue Feb 23, 2010 2:38 am
Contact:

Re: [RFC|Accepted] Updated BBcode engine

Post by Oleg »

Any sort of a bbcode engine rewrite will be required to have a comprehensive test suite.

User avatar
imkingdavid
Registered User
Posts: 1050
Joined: Thu Jul 30, 2009 12:06 pm

Re: [RFC|Accepted] Updated BBcode engine

Post by imkingdavid »

brunoais wrote:If a BBCode is deleted in the ACP, does the parsed version of that BBCode need to disappear from all posts?
BBCode is stored like: [tag:uid]content[/tag:uid]. So you don't need to worry about whether or not to parse it, because it won't know how to parse it.
brunoais wrote:If a BBCode is edited, does it only need to apply the edited posts or does it also need to apply for all posts in the forum?
As before, the BBCode is not stored as parsed HTML, but rather as the BBCode itself, so it will automatically be updated.
brunoais wrote:Do we need to have a parsing checkpoint of some sort? I mean... Do we need to have some string that looks like the current system does before storing in the database or... Can I just create code that stores an index about where each BBCode tag is?
The current [tag:uid] format should be fine, imo.
brunoais wrote:I suppose there will be requirements about which tags can exist inside other tags. Example: In the current system, the [* ] tag can only exist inside the [list ] tag.
How should it be handled? Ignore the tag that is outside, ignore the tag that is inside, or ignore the tag that is outside including all that is inside it (these three are quite easy to code and for php to process) or I should remove the one inside the tag instead?
IMO we should store a blacklist of tags that cannot be a given tag's child. Then, when parsing a tag, check each child tag against each parent to make sure it can exist inside of it.
Of course, the code tag should not parse any tags, so it might be good to use a whitelist in some cases and a blacklist in others?
I do custom MODs. PM for a quote!
View My: MODs | Portfolio
Please do NOT contact for support via PM or email.
Remember, the enemy's gate is down.

User avatar
EXreaction
Registered User
Posts: 1555
Joined: Sat Sep 10, 2005 2:15 am

Re: [RFC|Accepted] Updated BBcode engine

Post by EXreaction »

With an update, I believe it would be preferred to store both the parsed and unparsed message for performance reasons, would it not?

Post Reply