[RFC|Accepted] Updated BBcode engine

Note: We are moving the topics of this forum and it will be deleted at some point

Publish your own request for comments/change or patches for the next version of phpBB. Discuss the contributions and proposals of others. Upcoming releases are 3.2/Rhea and 3.3.
Post Reply
Oleg
Posts: 1150
Joined: Tue Feb 23, 2010 2:38 am
Contact:

Re: [RFC|Accepted] Updated BBcode engine

Post by Oleg »

That sounds very convincing.

If you post a couple of examples of messages to be parsed (pm or trashbin, whichever) I'll write a script that measures the time it takes to parse and render them.
I don't have anything handy, just doing some sanity checks.

Oleg
Posts: 1150
Joined: Tue Feb 23, 2010 2:38 am
Contact:

Re: [RFC|Accepted] Updated BBcode engine

Post by Oleg »

(19:57:12) Noxwizard: That TextFormatter library needs PHP's XSL library, which is not enabled by default.

What is the contingency for xsl being missing?

User avatar
JoshyPHP
Registered User
Posts: 381
Joined: Fri Jul 08, 2011 9:43 pm

Re: [RFC|Accepted] Updated BBcode engine

Post by JoshyPHP »

Multiple rendering backends. This is a subject I have barely scratched because I haven't seen a host that doesn't support ext/xsl and I was under the impression it was enabled by default, as opposed to being included by default.

One possible alternative to using ext/xsl is to compile the template into PHP code, similar in concept to what most templating engines do (including phpBB's if I'm not wrong.) XSL is an XML language, so parsing a stylesheet is easy. Templates used for markup purposes are fairly simple too--it's mainly HTML plus the occasional "if". The most complex part is having an XPath engine but luckily ext/dom offers one, and ext/dom is a sine qua none of the library. I have written a prototype that compiles a stylesheet into a PHP class (which must be eval()'ed or dumped to the disk and included) which uses echo/htmlspecialchars() for everything HTML and supports xsl:apply-templates, xsl:if, xsl:value-of, xsl:copy-of, xsl:attribute and xsl:choose but I haven't gotten around to testing it, so it only works "in theory."

[Edit: committed to a branch with a couple of tests]
[Edit2: merged into master]
Last edited by JoshyPHP on Thu May 16, 2013 11:27 pm, edited 1 time in total.

Oleg
Posts: 1150
Joined: Tue Feb 23, 2010 2:38 am
Contact:

Re: [RFC|Accepted] Updated BBcode engine

Post by Oleg »

http://de2.php.net/manual/en/xsl.installation.php

PHP 5 includes the XSL extension by default and can be enabled by adding the argument --with-xsl[=DIR] to your configure line.
libxml is a virtual requirement for php 5 if not an absolute requirement. libxslt is a small jump from libxml and probably is/can be made available on all sane systems. But obviously not all hosts are sane.

I'm curious if rendering via php dump/eval would actually work faster than xsl transforms.

User avatar
brunoais
Registered User
Posts: 964
Joined: Fri Dec 18, 2009 3:55 pm

Re: [RFC|Accepted] Updated BBcode engine

Post by brunoais »

Oleg wrote:
nickvergessen wrote:brunoais and I had a quick talk today, collecting all the stuff that should be done:
  1. we want to add conditions to bbcodes, so we can have different parsings in different cases (like nesting-depth (quoting 3 depth), permissions (allow flash), config settings (flash enabled), invalid-parents (b inside of flash, allow only some bbcodes inside of the quote-username), etc)
  2. we want to support valid utf8, [ and ] in urls/emails and new tlds (also for the magic url thing, without bbcode)
  3. we want to support nesting in bbcodes, like red[c=green]greenred again[/c]
  4. remove any difference between custom and basic bbcodes, all should be handled and set up the same way (currently some are in files, others in the database)
This is all nice but what about test coverage?
If I'm making tests, I need to know what to expect, specially what is considered correct and what is considered incorrect.
For now, I don't have those well defined so it's hard, for me, to make such tests.
Actually, It's revealing harder to find a good spec for the BBCode parser than what I thought. This was caused by all that bending that was made after I made my version of the spec. I was sure of what I needed to do and what it had to guarantee. Now I'm not that sure, I need some guidance about what the dev team expects the parser to do.
Oleg wrote: Have you checked performance of your implementation on 1) big posts and 2) transforming large numbers of ordinary posts? How about unusual things like nesting 100 levels deep?
I can give an update on that for mine's.
1) Super fast. Way, way faster than phpBB's current one.
2) ??? (I don't get what that is)
3) If you nest a tag with the same name more than 10 times, it starts to feel the "pressure". Nesting a tag 100 times has an average of 0.07 seconds, average, against no nesting, just the tag.
Anyway, that heavy reaction of slow parse is only felt while processing the 1st pass. The 2nd pass is processed in times in the order of 10^(-6)s for my current test code.

Code: Select all

[list]
	[*]elem1
	[*]
	[list]
		[*][b]elem2[/b]
		[*]
		[code] [list]
			[*]yeah
			[*][*]yep
[/list]
[/list]
[*]elem3
ending[/code]

(These numbers I wrote are results after having the code compiled in memory).

User avatar
JoshyPHP
Registered User
Posts: 381
Joined: Fri Jul 08, 2011 9:43 pm

Re: [RFC|Accepted] Updated BBcode engine

Post by JoshyPHP »

It's been a month since my last post here so I thought I'd post an update of my progress on the text formatting library I'm working on, s9e\TextFormatter. Most of the changes are not directly related to phpBB, but I always keep phpBB in mind when designing new parts or refactoring old parts of the library.

In my last post, I mentionned rendering backends. In the meantime, I have finished writing a generator that compiles an XSL stylesheet into PHP. It takes a string (the XSL source of a stylesheet) and returns the source of the renderer class, which can be dumped into a file. I haven't decided on an API for switching rendering backends yet. Its test suite checks and compares its output against the output of the XSLT renderer. They produce identical outputs for all [the bundled BBCodes](https://github.com/s9e/TextFormatter/bl ... sitory.xml) as well as the stock plugins (Emoticons, etc...) The only difference is that the PHP renderer correctly treats <embed> as a void element (as per HTML5) whereas the XSLT renderer treats it as a normal element, as per HTML4. They should produce the same DOM either way.

I've just started working on parsing and rendering posts from a phpBB database. This is still in its infancy (e.g. it only renders as the default Prosilver + custom BBCodes, and only does BBCodes + smilies) but it helped me outline how to handle some of phpBB's specificities. More about that in a second. For the time being, it lives as a lone script in an experimental branch of s9e\TextFormatter. Moving forward, the goal is to parse/render the posts stored in a database and compare DOM trees that each renderer: phpBB's, and s9e\TextFormatter's XSLT renderer and PHP renderer. By the way, I'm open to suggestions for the storage of the output. My plan for the time being will be to store the output of all the renderers in table in the DB, only if their outputs differ. When I'll be done with this script, I should have all the code required to create a parser and renderer usable in phpBB itself, so that side of the integration will be done.

Since last month, I have added support for stylesheet parameters and the BBCodes plugin has been updated to handle parameters transparently. I will spare you the details of parameters in XSL and I'll say that stylesheet parameters are variables that are accessible anywhere in the stylesheet (in the templates) and that their value can be set before rendering. In phpBB's case, they can be used for two things: translation and template logic. For example, part of the [QUOTE] BBCode renders as

Code: Select all

<cite>{TEXT1} {L_WROTE}{L_COLON}</cite>
You'll be able to change the value of L_WROTE with:

Code: Select all

$renderer->setParameter('L_WROTE', $user->lang['L_WROTE']); 
Note that there is a method to retrieve the names of all the parameters used in a stylesheet during configuration. Using it, you know what parameters can be set before rendering. Here's what my current script looks like

Code: Select all

$sql = 'SELECT bbcode_match, bbcode_tpl FROM ' . BBCODES_TABLE;
$result = $db->sql_query($sql);
while ($row = $db->sql_fetchrow($result))
{
    $configurator->BBCodes->addCustom($row['bbcode_match'], $row['bbcode_tpl']);
}
$db->sql_freeresult($result);

$renderer = $configurator->getRenderer();

// NOTE: under normal circumstances, you'd have to save the list of parameters with the renderer,
//       because you wouldn't have access to the configurator at rendering time
foreach ($configurator->stylesheet->getUsedParameters() as $paramName => $expr)
{
    if (preg_match('#^L_(\\w+)#', $paramName, $m) && isset($user->lang[$m[1]]))
    {
        $renderer->setParameter($paramName, $user->lang[$m[1]]);
    }
} 
Parameters can also be used for template logic. For instance, you could have a S_USER_LOGGED_IN parameter that mirrors phpBB's template var. The exact implementation is left open. s9e\TextFormatter supports template predicates (multiple templates can be set for the same BBCode, you could have a template with predicate $S_USER_LOGGED_IN = 1) and the XSL elements <xsl:if> and <xsl:choose>. For phpBB, you could support a subset of phpBB's template conditionals by transparently replacing <!-- IF S_USER_LOGGED_IN --> with <xsl:if test="$S_USER_LOGGED_IN">.

Ok, this took way longer to write as I expected. If anyone has any questions or ideas, please feel free.

TL;DR: phew, a month already! Added PHP renderer with plenty of tests. Added stylesheet parameters so maintain compatibility with phpBB's custom BBCodes. Working on rendering phpBB posts with s9e\TextFormatter.

User avatar
EXreaction
Registered User
Posts: 1555
Joined: Sat Sep 10, 2005 2:15 am

Re: [RFC|Accepted] Updated BBcode engine

Post by EXreaction »

I don't have the time now to look at your code, but what about supporting multiple optional tags? For example:

Code: Select all

[bbcode="test"]some text[/bbcode]

Code: Select all

[bbcode="test" foo="bar" foo2="bar2"]some text[/bbcode]
Sorry if you mentioned this somewhere, I didn't see anything about it specifically.

User avatar
JoshyPHP
Registered User
Posts: 381
Joined: Fri Jul 08, 2011 9:43 pm

Re: [RFC|Accepted] Updated BBcode engine

Post by JoshyPHP »

Yes, multiple attributes are supported, as well as default attributes as per your first example. Attribute values can be in single quotes, double quotes, or no quotes. The backslash serves as the escape character inside of quotes. Unquoted values extend either to the end of the tag, as in [QUOTE=John Doe] or to the next attribute, as in [FLASH width=10 height=20]. That's to accomodate the various rules I've encountered in other BBCode engines.

Speaking of attributes, I don't remember if I mentionned it earlier or if it got lost in a sea of technical details but there's a bunch of built-in filters (when you use {URL} it uses the #url filter, and so on) for them, in addition to custom filters that you can create or PHP callbacks that can be used to transform their value. Most of phpBB's tokens are available, except for {INTTEXT} and {LOCAL_URL}, which you can implement as custom filters. The configurator prevents improperly-filtered attributes to be used where a URL, some JavaScript or CSS is expected. For instance, all of those should throw an UnsafeTemplateException

Code: Select all

$configurator->BBCodes->addCustom('[BAD={TEXT1}]{TEXT2}[/BAD]', '<a href="{TEXT1}">{TEXT2}</a>');
$configurator->BBCodes->addCustom('[BAD={TEXT1}]{TEXT2}[/BAD]', '<b onblur="{TEXT1}">{TEXT2}</b>');
$configurator->BBCodes->addCustom('[BAD={TEXT1}]{TEXT2}[/BAD]', '<b style="{TEXT1}">{TEXT2}</b>');
$configurator->BBCodes->addCustom('[BAD]{TEXT}[/BAD]',          '<script>{TEXT}"</script>');
$configurator->BBCodes->addCustom('[BAD]{TEXT}[/BAD]',          '<style>{TEXT}"</script>');
...whereas this one is fine:

Code: Select all

$configurator->BBCodes->addCustom('[GOOD={TEXT1}]{TEXT2}[/GOOD]', '<b title="{TEXT1}">{TEXT2}</b>');
It also prevents the XSLT function document() to be used. If you want to create a template that the configurator identifies as unsafe, it's possible using the verbose API and the UnsafeTemplate object.

User avatar
JoshyPHP
Registered User
Posts: 381
Joined: Fri Jul 08, 2011 9:43 pm

Re: [RFC|Accepted] Updated BBcode engine

Post by JoshyPHP »

Bumping this topic to see if there's some new activity that I missed. Is there something I can do on my side to help you move forward?

Also, since the discussion has moved away from its original RFC, wouldn't it make sense to create a new RFC/topic? Perhaps even more than one RFC, so that you can break down this fairly big change into smaller tasks?

User avatar
brunoais
Registered User
Posts: 964
Joined: Fri Dec 18, 2009 3:55 pm

Re: [RFC|Accepted] Updated BBcode engine

Post by brunoais »

Naw, no need. I just need some more time in my life. And that's just gonna happen during the holidays.
I'll have time during holidays to do this (I have almost all the plan of action ready) and then move on to the WYSIWYG editor.

I just need you to be patient due to the volume of work I got that was unexpected compared to what I was told previously that the volume of work a masters student like me has.I'll have this done by the end of my summer hollidays.
I'll have the WYSIWYG editor probably (high chance) ready by the end of my summer hollidays.
After that, it's just some bug fixing and other stuff that I can do in my time breaks.

Post Reply