[RFC|Accepted] Updated BBcode engine

Note: We are moving the topics of this forum and it will be deleted at some point

Publish your own request for comments/change or patches for the next version of phpBB. Discuss the contributions and proposals of others. Upcoming releases are 3.2/Rhea and 3.3.
Post Reply
User avatar
brunoais
Registered User
Posts: 958
Joined: Fri Dec 18, 2009 3:55 pm

Re: [RFC|Accepted] Updated BBcode engine

Post by brunoais » Thu Dec 20, 2012 7:13 pm

I'm telling... You can do that with the parser I'm developing. The parser that is being updated.
The current phpBB's parser does not allow it.

yops
Registered User
Posts: 9
Joined: Sat Jul 21, 2012 10:52 pm

Re: [RFC|Accepted] Updated BBcode engine

Post by yops » Thu Dec 20, 2012 7:44 pm

My mistake, sorry.
Thanks for the info :)

User avatar
JoshyPHP
Registered User
Posts: 348
Joined: Fri Jul 08, 2011 9:43 pm

Re: [RFC|Accepted] Updated BBcode engine

Post by JoshyPHP » Thu Dec 20, 2012 10:51 pm

I see that this topic has been relatively active lately thanks to brunoais's efforts, so it would feel disingenuous if I waited more to say that I'm currently writing a text formatting library, which includes a BBCode plugin. And since my long-term goal is to see it included in forum softwares (with phpBB being on the top of my list) it is possible that my efforts would render his obsolete. I haven't mentionned it before because I don't want to hype anything before it is functional and tested, but it would feel weird if a few months from now I dropped 10K LOC saying "oh and btw it solves every problem ever mentionned in that thread but I didn't want to spoil the surprise."

Now I'm not sure how to proceed forward. I'm a bit torn between publishing the code despite being incomplete and having no end-user documentation to let people judge of its potential—I'm not sure the idea even makes sense but it does seem to produce results—and focusing my energy on finishing the damn thing and see where I can go from there. Maybe I'll find a good compromise and publish stuff when it reaches alpha state.

Anyway, I wish you the best, brunoais. Thanks for prodding devs to make them post more about their expectations. :) I'll keep monitoring this topic.

User avatar
EXreaction
Registered User
Posts: 1555
Joined: Sat Sep 10, 2005 2:15 am

Re: [RFC|Accepted] Updated BBcode engine

Post by EXreaction » Wed Dec 26, 2012 4:43 pm

nickvergessen wrote:brunoais and I had a quick talk today, collecting all the stuff that should be done:
  1. we want to add conditions to bbcodes, so we can have different parsings in different cases (like nesting-depth (quoting 3 depth), permissions (allow flash), config settings (flash enabled), invalid-parents (b inside of flash, allow only some bbcodes inside of the quote-username), etc)
  2. we want to support valid utf8, [ and ] in urls/emails and new tlds (also for the magic url thing, without bbcode)
  3. we want to support nesting in bbcodes, like red[c=green]greenred again[/c]
  4. remove any difference between custom and basic bbcodes, all should be handled and set up the same way (currently some are in files, others in the database)
This sounds good to me.

User avatar
Meis2M
Registered User
Posts: 411
Joined: Fri Apr 23, 2010 10:18 am
Contact:

Re: [RFC|Accepted] Updated BBcode engine

Post by Meis2M » Wed Dec 26, 2012 5:57 pm

who is working on Updated BBcode engine ?

User avatar
imkingdavid
Registered User
Posts: 1050
Joined: Thu Jul 30, 2009 12:06 pm

Re: [RFC|Accepted] Updated BBcode engine

Post by imkingdavid » Wed Dec 26, 2012 5:58 pm

Meis2M wrote:who is working on Updated BBcode engine ?
brunoais
I do custom MODs. PM for a quote!
View My: MODs | Portfolio
Please do NOT contact for support via PM or email.
Remember, the enemy's gate is down.

User avatar
JoshyPHP
Registered User
Posts: 348
Joined: Fri Jul 08, 2011 9:43 pm

Re: [RFC|Accepted] Updated BBcode engine

Post by JoshyPHP » Tue Jan 01, 2013 10:48 pm

I'm following up on my previous post. As I said, I've been working on a text formatting library, s9e\TextFormatter, for a little while now. Today, even though I still considering it some ways from being in a releasable state I have registered the project on Travis Image and Packagist, so that people can have a stab at it.

This isn't specifically a "BBCode engine", this is a "text formatter" with a BBCode plugin. It's designed and has plugins to handle other aspects such as magic links, emoticons or censoring words. The process is organized in 3 steps: configuration, parsing and rendering. First, plugins are loaded and configured (that's the bulk of the API, as well as the bulk of the code) then the configurator generates a serializable instance of Parser and Renderer, which are used for the other two steps.

The BBCode plugin is able to parse most of the BBCode syntaxes that are used in forum softwares and other CMS thingy. It handles multiple attributes, optional attributes, default attributes, composite attributes (e.g. [flash=<width>,<height>]), polymorphism (e.g. how [ url ] works), default values and everything in between. The BBCodes configurator can understand the same format that's used in phpBB's admin panel, and extends it to accomodate for the extra features.

In addition to those BBCode-specific features, the main parser handles filtering attributes, applying rules (most of those can be automatically generated based on the HTML specs) and watching over limits, which can be imposed on the number of regexp matches a plugin processes or the number of times a given tag can be used, or nested. In addition to accepting custom plugins, it also has hooks that lets you register custom parsers, callbacks that filter tags or and callbacks that filter/sanitize attributes. In order to accomodate for posting options (e.g. "no emoticons in this message"), plugins and tags can be selectively disabled even after configuration.

I think that integrating s9e\TextFormatter into phpBB would have a better return on time invested than developing a new custom engine from scratch. It would require an overhaul/cleanup of phpBB's parsing/rendering functions though, which I can't offer. message_parser would lose a lot of code (basically replaced by $parser->parse($text)) and it would require a consolidation of the various functions currently required to display a message, which basically amounts to replacing all the calls to censor_text(), bbcode_second_pass(), bbcode_nl2br() and smiley_text() with $renderer->render($parsed_text). I will happily offer support for my side of the code, though.

In the meantime, if you want to play with it you can clone the project from GitHub and take a look at docs/example.php; You can use addCustom() to create any BBCodes and see how they interact. There's also a couple of documents in which I keep notes, which can be of interest: description.md, benefits.md, ParserDifferences.md. There's also the testdox output of the 1665 tests of the test suite, but it will probably look like complete gibberish. :P

I'm subscribed to this topic so if you have a question, you can post it here, thanks.

Oleg
Posts: 1150
Joined: Tue Feb 23, 2010 2:38 am
Contact:

Re: [RFC|Accepted] Updated BBcode engine

Post by Oleg » Tue Jan 01, 2013 11:06 pm

nickvergessen wrote:brunoais and I had a quick talk today, collecting all the stuff that should be done:
  1. we want to add conditions to bbcodes, so we can have different parsings in different cases (like nesting-depth (quoting 3 depth), permissions (allow flash), config settings (flash enabled), invalid-parents (b inside of flash, allow only some bbcodes inside of the quote-username), etc)
  2. we want to support valid utf8, [ and ] in urls/emails and new tlds (also for the magic url thing, without bbcode)
  3. we want to support nesting in bbcodes, like red[c=green]greenred again[/c]
  4. remove any difference between custom and basic bbcodes, all should be handled and set up the same way (currently some are in files, others in the database)
This is all nice but what about test coverage?

Oleg
Posts: 1150
Joined: Tue Feb 23, 2010 2:38 am
Contact:

Re: [RFC|Accepted] Updated BBcode engine

Post by Oleg » Tue Jan 01, 2013 11:11 pm

JoshyPHP wrote:There's also the testdox output of the 1665 tests of the test suite, but it will probably look like complete gibberish. :P
Now that is impressive.

Have you checked performance of your implementation on 1) big posts and 2) transforming large numbers of ordinary posts? How about unusual things like nesting 100 levels deep?

Also does it address everything on nickvergessen's list that I just quoted above?

User avatar
JoshyPHP
Registered User
Posts: 348
Joined: Fri Jul 08, 2011 9:43 pm

Re: [RFC|Accepted] Updated BBcode engine

Post by JoshyPHP » Wed Jan 02, 2013 12:31 am

Thanks. I'm aiming at 100% coverage, minus a few codepaths that are unreachable under normal conditions.

I haven't really checked the performance of any specific situation, even though I have a good idea of the parameters in play. The BBCode parser uses one big preg_match_all() to find where the BBCodes are, then it iterates over BBCodes with attributes (the attribute parsing isn't done with PCRE) then the main parser iterates over tags to process them. That means that the time is pretty much constant for a given text, regardless of the number of BBCodes defined or the size of the text, but it does increase with the number of matches, which is why there is a configurable limit (set to 1000 by default) on the number of matches. The nesting level doesn't effect the parsing time. The parser isn't recursive, it sorts tags and processes them in document order while keeping a counter of the number of each tags open, so it doesn't matter whether the tags are nested or not. I'm not aware of any pathological case.

The parser produces an intermediate representation of the text, as an XML string, which is what you'd store in your db. The rendering is done by PHP's XSL extension. The transformation has a pretty big (relative to the rest) startup cost, but its performance degrades very slowly with the number of tags. To amortize the startup cost, the renderer has a second method, renderMulti() which renders multiple texts at a time (takes an array, returns an array). In my informal tests, I could see that rendering 10 posts at a time would take about twice as long as rendering one single post. With that said, it's comparable in terms of performance (time spent per page) to phpBB's current renderer.

If you post a couple of examples of messages to be parsed (pm or trashbin, whichever) I'll write a script that measures the time it takes to parse and render them.
nickvergessen wrote:
  1. we want to add conditions to bbcodes, so we can have different parsings in different cases (like nesting-depth (quoting 3 depth), permissions (allow flash), config settings (flash enabled), invalid-parents (b inside of flash, allow only some bbcodes inside of the quote-username), etc)
  2. we want to support valid utf8, [ and ] in urls/emails and new tlds (also for the magic url thing, without bbcode)
  3. we want to support nesting in bbcodes, like red[c=green]greenred again[/c]
  4. remove any difference between custom and basic bbcodes, all should be handled and set up the same way (currently some are in files, others in the database)
  1. Nesting level is covered. Individual tags can be disabled, e.g. $parser->disableTag('flash'), there's a comprehensive set of rules managing what's allowed where, most of which can even be generated automatically (search for TemplateForensics in testdox), and there's no BBCodes inside of attributes
  2. the parser expects the input to be valid UTF-8. URLs and emails are validated with ext/filter, so whatever they allow. The magic url thing is handled by the Autolink plugin. I didn't remember whether it liked brackets so I wrote a test for it and it turns out it handles them splendidly. It also balances parentheses.
  3. it does
  4. there's no basic BBCode, everything can be considered custom. I intend to ship the library with a repository of commonly used BBCodes. A project like phpBB can bundle its own repository.

Post Reply