BBCode Support

General discussion of development ideas and the approaches taken in the 3.x branch of phpBB. The next feature release of phpBB 3 will be 3.3/Proteus.
Forum rules
Please do not post support questions regarding installing, updating, or upgrading phpBB 3.1. If you need support for phpBB 3.1 please visit the 3.1.x Support Forum on phpbb.com.

If you have questions regarding writing extensions please post in Extension Writers Discussion to receive proper guidance from our staff and community.
Post Reply
ckwalsh
Registered User
Posts: 54
Joined: Tue Apr 18, 2006 2:25 am

BBCode Support

Post by ckwalsh » Sat Dec 26, 2009 8:48 am

There were some discussions about expanding the BBCode system in phpBB 3.1+ or 4.x, and I was hoping some of these things could be worked out. I have not followed 3.1 development closely, which the bbcode parser may be based on, so I appologize if some of this has already been worked out.

Types of BBCodes
From simple observation, it seems BBCodes can be separated into at least 3 separate classes based on the complexity of their operations: simple find and replace, complex find and replace (regex/tokens), and programatic, using php based logic. The first case would likely be used most for enabling html tags, such as <b>, <i>, <u>, and so on, while the second used for the quote BBCode and similar, and the third the code BBCode, supporting syntax highlighting.

With this separation, bbcodes could be easily separated into different classes, with hte first two reusing classes with different paramaters, and the third using a unique class for each.

Simple find and replace
These would be the easiest to define, and be the most commonly used bbcode. They could be processed by a base class that takes a list of bbcode tags (b, i, u), their starting text (<b>, <i>, <u>), and ending text (</b>, </i>, </u>) then would process all together. This should take one pass through preg_match_all and would be a simplified version of the phpBB 3.0 processor.

Another option with this type of bbcode would be a singular bbcode, such as [br] that does not require a matching pair.

Complex find and replace (regex/tokens)
This class of bbcodes should allow tokens for validating data and re-displaying it to the user, such as for the quote bbcode, while allowing users to do their own validation using regular expressions. Each token should be replaced by a regular expression to validate its data, then made availible for use to the administrator as in phpBB 3.0. These tokens should be customisable, and admins should be able to add additional tokens for patterns they may often use.

Since curly brackets are used already in regex, I propose the posix character class bracket syntax be used as an alternative. It is already used in some flavors of regex, and allows for simple extentions such as [:email:], [:phone:], or [:color:].

Because of the complexities of this syntax, which actually does not need to resemble bbcode syntax, each bbcode would need a separate pass in the bbcode processor.

Programatic
These bbcodes would be implemented completely by php classes, and would need to be uploaded by the user. Each would act on a specific interface, providing a regular expression for the starting and ending tags and a method to do text operations. This would allow the bbcode parser to validate start/end tag pairing, and only pass the text of matching pairs to the manipulator method. This type of bbcode might be used for the code tag, or for embedding a wiki markup engine.

Smilies?
Smilies are very similar to bbcodes, except they do not obey the standard bbcode syntax and are strictly find and replace, as a single element bbcode [br] might be. It would make sense to process them in the same way as bbcodes. However, they are separate enough that I didn't include them in my 3 types of bbcodes.

Performance
Performance is a huge consideration with bbcodes. At the moment, phpBB uses a bbcode_uid string to identify bbcodes with matching pairs, and only replace them rather than all instances. This uid is added to all bbcode tags at post time, and is stored embedded in the post text in the database. One problem with this is that the UID is embedded inside the brackets of the bbcode tag itself. With the bbcodes above, there is no guarantee that bbcodes actually use standard bbcode syntax. Thus, it may make more sense to actually contatenate the uid to the bbcode start and end tags, which can also be appended in their matching regex. While this doesn't produce as pretty a database markup, it should produce consistent results.

User avatar
Eelke
Registered User
Posts: 606
Joined: Thu Dec 20, 2001 8:00 am
Location: Bussum, NL
Contact:

Re: BBCode Support

Post by Eelke » Sat Dec 26, 2009 10:58 am

WRT storing codes in the database, I would like to draw your attention to a different approach altogether.

Some would consider it bad practice to touch user input at all. Store it in de database as is. That way, you will never have to do any "reverse-parsing" to allow editing the input. Of course, that would mean you'd have to render the output "every time" it needs to be displayed. Some kind of caching mechanism could be devised to avoid actually having to render the output every time a post needs to be displayed.

Also, I think an important consideration is, where would this kind of logic go? Elsewhere, it was discussed to allow for "pluggable" input formats. Would bbcodes simply be "yet another" pluggable input format? And/or would programatic bbcodes be inplemented as plugins to the input system? And/or would these principles be embedded into the base system?

User avatar
naderman
Consultant
Posts: 1727
Joined: Sun Jan 11, 2004 2:11 am
Location: Karlsruhe, Germany
Contact:

Re: BBCode Support

Post by naderman » Sat Dec 26, 2009 11:07 am

I would very much prefer a BBCode parser like the one in 3.1. It's stack based, it stores the bbcode stack of a post in a separate column, leaving the original content untouched. Reparsing all posts is as simple as emptying that column. It will automatically be refilled on viewing a post. The stack based approach with a standard syntax for bbcode tags and arguments is also very extensible. It even allows manual configuration of new bbcodes through the ACP with sufficient options (rather than the few in 3.0). It allows for processing of text only, e.g. for smileys or word censors. There is no need for the bbcode uid because the entire structure of the post can be easily loaded rather than having to be reparsed. The stack-based approch also doesn't have all the parsing issues that stem from the use of regular expressions in 3.0.

bobtheman
Registered User
Posts: 63
Joined: Sat Dec 19, 2009 4:00 pm

Re: BBCode Support

Post by bobtheman » Sat Dec 26, 2009 2:52 pm

I ask that you look into this thread, viewtopic.php?f=75&t=32079 .... the need for bbcode will be little

ckwalsh
Registered User
Posts: 54
Joined: Tue Apr 18, 2006 2:25 am

Re: BBCode Support

Post by ckwalsh » Sat Dec 26, 2009 5:26 pm

Overall it sounds excellent, I'm just concerned about this:
naderman wrote:It stores the bbcode stack of a post in a separate column, leaving the original content untouched. Reparsing all posts is as simple as emptying that column. It will automatically be refilled on viewing a post.
Potentially, someone could double the size of hte database by abusing this functionality, viewing every single post in the forum. There are some hosts that limit database size, so could this be a problem?

Also, this sounds like something memcached is good at: Keeping a chunk of data availible and an expiration time and no need to go to the database. Could such stack caches be stored in the caching backend rather than the posts table?

EDIT: Am I missing it, or is the 3.1 parser not yet availible? I am only seeing implementations in the repo that depend on a uid.

User avatar
ToonArmy
Registered User
Posts: 335
Joined: Fri Mar 26, 2004 7:31 pm
Location: Bristol, UK
Contact:

Re: BBCode Support

Post by ToonArmy » Sat Dec 26, 2009 8:12 pm

Brainy wrote:Overall it sounds excellent, I'm just concerned about this:
naderman wrote:It stores the bbcode stack of a post in a separate column, leaving the original content untouched. Reparsing all posts is as simple as emptying that column. It will automatically be refilled on viewing a post.
Potentially, someone could double the size of hte database by abusing this functionality, viewing every single post in the forum. There are some hosts that limit database size, so could this be a problem?
It won't double the size of the database, I'm not familiar with what is stored but it could double row in the table.
Brainy wrote:Also, this sounds like something memcached is good at: Keeping a chunk of data availible and an expiration time and no need to go to the database. Could such stack caches be stored in the caching backend rather than the posts table?
A provider that limits the size of the database to something small enough that this will be an issue isn't going to provide memcache. ;)
Brainy wrote:EDIT: Am I missing it, or is the 3.1 parser not yet availible? I am only seeing implementations in the repo that depend on a uid.
http://code.phpbb.com/repositories/brow ... des/bbcode
Chris SmithBlogXMOOhlohArea51WikiNo support via PM/IM
Image

User avatar
ToonArmy
Registered User
Posts: 335
Joined: Fri Mar 26, 2004 7:31 pm
Location: Bristol, UK
Contact:

Re: BBCode Support

Post by ToonArmy » Sat Dec 26, 2009 8:14 pm

Eelke wrote:Also, I think an important consideration is, where would this kind of logic go? Elsewhere, it was discussed to allow for "pluggable" input formats. Would bbcodes simply be "yet another" pluggable input format? And/or would programatic bbcodes be inplemented as plugins to the input system? And/or would these principles be embedded into the base system?
BBCode will be one of the text format types and additional bbcodes will be dropped in to extend it as needed.
Chris SmithBlogXMOOhlohArea51WikiNo support via PM/IM
Image

User avatar
naderman
Consultant
Posts: 1727
Joined: Sun Jan 11, 2004 2:11 am
Location: Karlsruhe, Germany
Contact:

Re: BBCode Support

Post by naderman » Sun Dec 27, 2009 3:17 am

Brainy wrote:
naderman wrote:It stores the bbcode stack of a post in a separate column, leaving the original content untouched. Reparsing all posts is as simple as emptying that column. It will automatically be refilled on viewing a post.
Potentially, someone could double the size of hte database by abusing this functionality, viewing every single post in the forum. There are some hosts that limit database size, so could this be a problem?
Since usually all posts have at least be viewed once they will have the cache column filled. And viewing every single post in a forum after clearning the cache is a performance issue unrelated to bbcodes ;-) Ideally the cache column really only stores the structure. The contents are taken from the untouched text column.
Brainy wrote:Also, this sounds like something memcached is good at: Keeping a chunk of data availible and an expiration time and no need to go to the database. Could such stack caches be stored in the caching backend rather than the posts table?
I doubt you would want to store all those posts in ram. You need to retrieve the post itself from the database anyway. There's not much of a downside to having a column for the structure in there as well. Also this way you don't have to reparse a post unless you make changes to the bbcode definition and clear the cache/structure column.

ckwalsh
Registered User
Posts: 54
Joined: Tue Apr 18, 2006 2:25 am

Re: BBCode Support

Post by ckwalsh » Sun Dec 27, 2009 6:13 am

Ah, I misunderstood. I thought you meant having both the parsed copy and original post in the database. Since the parsed copy will doubtless be bigger, it would be a space issue.

Post Reply