[RFC|Accepted] Updated BBcode engine

Note: We are moving the topics of this forum and it will be deleted at some point

Publish your own request for comments/change or patches for the next version of phpBB. Discuss the contributions and proposals of others. Upcoming releases are 3.2/Rhea and 3.3.
Post Reply
User avatar
brunoais
Registered User
Posts: 958
Joined: Fri Dec 18, 2009 3:55 pm

Re: [RFC|Accepted] Updated BBcode engine

Post by brunoais » Sat Dec 15, 2012 10:47 am

JoshyPHP wrote:In an ideal world, it would go like this:
  • have an editor that never produces junk
  • if junk is found, alert the user
  • if there's no user, fasten your seat belt and prepare for a crash landing. IOW, transform the junk into something that won't break the page
What you just wrote reminds me of CAP. (It's a model for distributed systems. You cannot have "perfect" availability, consistency and partition tolerance at the same time)
With BBCode parsers you cannot have:
Fix all user mistakes silently and never produce junk text.
Parse all BBCode from any source bulletin board and its own BBCode rules and expect for it to work exactly the same way everywhere.


On another subject. I did some tests using zero-length tags*1 and it seems like it is the answer to many of the problems the parser is required to solve that it does not solve yet.
From what I understood, the current parser is required to (silently or not*2) fix some mis-nested tags. Example:

Code: Select all

[i]ab[b]cd[/i]ef[/b]
becomes:
 ab cd ef 
OR
 ab cd ef 
but

Code: Select all

[quote][url]abc[/quote][/url]
becomes (simplified):
 [url]abc [/url]
I still didn't get the full details but I should get to something, somehow. At least, with the tests, I should get some usable input on that.

*1 It does make sense for my parser, anyway if you want some clarification, I can try to explain why it accepts such thing.
*2 I still need to understand if errors should be reported to the user or not. I tested in multiple BB's and only one of them ever reports mistakes back to the user. All others (that use BBCode) will somehow try to fix or they just ignore malformed BBCode.
JoshyPHP wrote: In an ideal world, it would go like this:
(...)
  • if there's no user, fasten your seat belt and prepare for a crash landing. IOW, transform the junk into something that won't break the page
It is actually possible to make a parser that will not allow a post to break the page.
What's impossible is to make a parser that guarantees that the page will stay intact and that fixes any mistakes made by the user.
Pony99CA wrote: I agree with that 100%. Maybe the BBCode parser will take an option that specifies if a user entered the BBCode or not (for example, it's going through a converter). If a user entered it, flag the error; otherwise, try to emit something valid.
Flagging errors with any markup is not a trivial task. Specially because after the 1st broken thing, any errors it tries to find are not reliable.
Pony99CA wrote:
callumacrae wrote:Re the error, what happens when a user edits a post made before 3.1? Will they have to fix the markup before they can submit the edited post?
In my opinion, yes. It's no different than somebody creating a new post with invalid markup.
I was going into that one, but seems like the "big guys" don't want it... I don't blame 'em it, somewhat, makes sense. It's called backwards compatibility.

User avatar
Pony99CA
Registered User
Posts: 986
Joined: Sun Feb 08, 2009 2:35 am
Location: Hollister, CA
Contact:

Re: [RFC|Accepted] Updated BBcode engine

Post by Pony99CA » Tue Dec 18, 2012 2:55 am

brunoais wrote:
Pony99CA wrote: I agree with that 100%. Maybe the BBCode parser will take an option that specifies if a user entered the BBCode or not (for example, it's going through a converter). If a user entered it, flag the error; otherwise, try to emit something valid.
Flagging errors with any markup is not a trivial task. Specially because after the 1st broken thing, any errors it tries to find are not reliable.
Yes, that's certainly true. As I used to work on a compiler, I'm quite familiar with the problem. However, it's better to try than to give up, but maybe you put a threshold on errors -- if there are more than, say, 10, put out an error saying "Too many BBCode formatting errors found; fix the above and Preview the post again."
brunoais wrote:
Pony99CA wrote:
callumacrae wrote:Re the error, what happens when a user edits a post made before 3.1? Will they have to fix the markup before they can submit the edited post?
In my opinion, yes. It's no different than somebody creating a new post with invalid markup.
I was going into that one, but seems like the "big guys" don't want it... I don't blame 'em it, somewhat, makes sense. It's called backwards compatibility.
As long as the converter piece generates reasonable code, this situation shouldn't really come up, right? The converter would store "fixed" BBCode markup, not the original.

However, if the converter didn't store fixed markup, if somebody is editing a post, that's new content, so there's no need for backward compatibility. GIve the errors and let them decide if they want to continue with the editing or not.

I presume that the new BBCode generator won't generate completely backward compatible code anyway, right? For example, in the case where the COLOR and B tags were nested improperly, you'll either try to do the "right" thing or give an error, but neither of those would produce the same output that we currently get. Thus, true backward compatibility is already broken.

Steve
Silicon Valley Pocket PC (http://www.svpocketpc.com)
Creator of manage_bots and spoof_user (ask me)
Need hosting for a small forum with full cPanel & MySQL access? Contact me or PM me.

User avatar
brunoais
Registered User
Posts: 958
Joined: Fri Dec 18, 2009 3:55 pm

Re: [RFC|Accepted] Updated BBcode engine

Post by brunoais » Tue Dec 18, 2012 9:29 am

Pony99CA wrote:
brunoais wrote:
Pony99CA wrote: I agree with that 100%. Maybe the BBCode parser will take an option that specifies if a user entered the BBCode or not (for example, it's going through a converter). If a user entered it, flag the error; otherwise, try to emit something valid.
Flagging errors with any markup is not a trivial task. Specially because after the 1st broken thing, any errors it tries to find are not reliable.
Yes, that's certainly true. As I used to work on a compiler, I'm quite familiar with the problem. However, it's better to try than to give up, but maybe you put a threshold on errors -- if there are more than, say, 10, put out an error saying "Too many BBCode formatting errors found; fix the above and Preview the post again."
Oh well... The way this is made, this will require a step in the parser specially made to check for errors...
I'm now working on markup fixing. It's one of the requirements I was told. I truly believe I have to make some changes I didn't want to do to make that one work.
I'll see... For now, I'm trying not to make those changes, but the odds are not on my side...
Pony99CA wrote:
brunoais wrote:
Pony99CA wrote:
callumacrae wrote:Re the error, what happens when a user edits a post made before 3.1? Will they have to fix the markup before they can submit the edited post?
In my opinion, yes. It's no different than somebody creating a new post with invalid markup.
I was going into that one, but seems like the "big guys" don't want it... I don't blame 'em it, somewhat, makes sense. It's called backwards compatibility.
As long as the converter piece generates reasonable code, this situation shouldn't really come up, right? The converter would store "fixed" BBCode markup, not the original.
What? The database updater code? They don't want it to recalculate the posts data. It does make sense, though. There are phpBB forums out there with more than 1000000 posts. Updating that would take a lot of time.
Assuming, for each post, it takes 0,01 seconds to reparse (don't forget the DB traffic). 1000000 * 0,01 = 10000s = 166.667 minutes = 2h46m42s.
I'm assuming that about 80% of the posts has BBCode and has BBCode turned on.
area51 has low traffic and has 146306 posts.
Our neighbor, the phpbb.com forum, has 3805370 posts.
That means that it would take easily 5h to recreate the posts. Does not sound right to me. Also considering that there are posts in our neighbor that are not seen for years, but they are there. Does it make sense to keep the server waiting because of that? Ofc not. So I'm using an alternative. When a post is requested for viewing, I revert the post to plain BBCode form (into what the current system does for edition) and then I feed it to my parser. Then I store the intermediate step into the DB and I display the post to the user. I do this for each post that is meant to be displayed. This will happen on the first time each post is viewed after the update. This way the first time the page shows, it may be slow, but the next times it will be faster than it was before (hopefully).
Pony99CA wrote: However, if the converter didn't store fixed markup, if somebody is editing a post, that's new content, so there's no need for backward compatibility. GIve the errors and let them decide if they want to continue with the editing or not.
That's perfectly ok for guys like us, who are used to deal with compilers. But how about the "average" phpBB user?
They are ppl that when there's an error, they ask how to hide it, not how to fix it.
In p@p ppl go there and ask why a piece of code does not work, and then they show the error. For example, for php, they show stuff like the "syntax error, unexpected '$var' in [someplace] on line 15" and then they say they cannot find the mistake on line 15! This happens to exactly the same user, year after year after year! I think that more than 80% of the people (maybe even 90%) cannot handle fixing based on the errors that a compiler can give. It's the unfortunate truth.
Pony99CA wrote: I presume that the new BBCode generator won't generate completely backward compatible code anyway, right? For example, in the case where the COLOR and B tags were nested improperly, you'll either try to do the "right" thing or give an error, but neither of those would produce the same output that we currently get. Thus, true backward compatibility is already broken.
I know... But they want the parser to be able to fix small simple markup errors... Also, I'm trying to deal with the "[*]" situation in the most efficient way...

yops
Registered User
Posts: 9
Joined: Sat Jul 21, 2012 10:52 pm

Re: [RFC|Accepted] Updated BBcode engine

Post by yops » Thu Dec 20, 2012 11:08 am

I have been working on table BBcode recently and I faced a problem: line break
If there is a line break between the tags, phpbb will generate a <br /> for each line

So my suggestion is that you add a condition to ignore lines break

User avatar
nickvergessen
Former Team Member
Posts: 733
Joined: Sun Oct 07, 2007 11:54 am
Location: Stuttgart, Germany
Contact:

Re: [RFC|Accepted] Updated BBcode engine

Post by nickvergessen » Thu Dec 20, 2012 11:40 am

brunoais and I had a quick talk today, collecting all the stuff that should be done:
  1. we want to add conditions to bbcodes, so we can have different parsings in different cases (like nesting-depth (quoting 3 depth), permissions (allow flash), config settings (flash enabled), invalid-parents (b inside of flash, allow only some bbcodes inside of the quote-username), etc)
  2. we want to support valid utf8, [ and ] in urls/emails and new tlds (also for the magic url thing, without bbcode)
  3. we want to support nesting in bbcodes, like red[c=green]greenred again[/c]
  4. remove any difference between custom and basic bbcodes, all should be handled and set up the same way (currently some are in files, others in the database)
Member of the Development-TeamNo Support via PM

User avatar
JoshyPHP
Registered User
Posts: 350
Joined: Fri Jul 08, 2011 9:43 pm

Re: [RFC|Accepted] Updated BBcode engine

Post by JoshyPHP » Thu Dec 20, 2012 12:33 pm

nickvergessen wrote:allow only some bbcodes inside of the quote-username
About BBCodes inside the [quote] author, in my experience it's a feature that almost nobody uses. I don't have access to a large corpus, but in my own data I've only found 48 occurrences of those in ~255K posts (or ~0.02%), and all of them were a single [url] tag that encompassed the whole value. In other words, it would have been completely unused if the [quote] tag had an optional URL attribute.

In my opinion, it's a feature that's not worth the extra code complexity of setting up a recursive parser and its use cases should be largely covered by optional attributes or custom BBCodes.

User avatar
brunoais
Registered User
Posts: 958
Joined: Fri Dec 18, 2009 3:55 pm

Re: [RFC|Accepted] Updated BBcode engine

Post by brunoais » Thu Dec 20, 2012 12:36 pm

JoshyPHP wrote:
nickvergessen wrote:allow only some bbcodes inside of the quote-username
(...)
In my opinion, it's a feature that's not worth the extra code complexity of setting up a recursive parser and its use cases should be largely covered by optional attributes or custom BBCodes.
That's pretty much what I think also...
yops wrote:I have been working on table BBcode recently and I faced a problem: line break
If there is a line break between the tags, phpbb will generate a <br /> for each line

So my suggestion is that you add a condition to ignore lines break
Please elaborate, give more details, give, at least, one example.

yops
Registered User
Posts: 9
Joined: Sat Jul 21, 2012 10:52 pm

Re: [RFC|Accepted] Updated BBcode engine

Post by yops » Thu Dec 20, 2012 5:30 pm

Sure,
lets say that we have the following BBcode generated:

Code: Select all

[table][tr][td]text1[/td]
[td]test2[/td]
[/tr]
[tr][td]text3[/td]
[td]test4[/td]
[/tr]
[/table]
phpbb will generate a <br /> for each line break and the generated table will have many line breaks
but if the generated bbcode is like this:

Code: Select all

[table][tr][td]text1[/td][td]test2[/td][/tr][tr][td]text3[/td][td]test4[/td][/tr][/table]
the table will showup just fine

User avatar
brunoais
Registered User
Posts: 958
Joined: Fri Dec 18, 2009 3:55 pm

Re: [RFC|Accepted] Updated BBcode engine

Post by brunoais » Thu Dec 20, 2012 6:47 pm

Just tell the parser that the [table] tag cannot have text as its child, same for the [tr] tag.

yops
Registered User
Posts: 9
Joined: Sat Jul 21, 2012 10:52 pm

Re: [RFC|Accepted] Updated BBcode engine

Post by yops » Thu Dec 20, 2012 7:06 pm

I tried to do that but it didnt work at all, maybe I did it wrong..
Can you show me how?

Post Reply