Types of BBCodes
From simple observation, it seems BBCodes can be separated into at least 3 separate classes based on the complexity of their operations: simple find and replace, complex find and replace (regex/tokens), and programatic, using php based logic. The first case would likely be used most for enabling html tags, such as <b>, <i>, <u>, and so on, while the second used for the quote BBCode and similar, and the third the code BBCode, supporting syntax highlighting.
With this separation, bbcodes could be easily separated into different classes, with hte first two reusing classes with different paramaters, and the third using a unique class for each.
Simple find and replace
These would be the easiest to define, and be the most commonly used bbcode. They could be processed by a base class that takes a list of bbcode tags (b, i, u), their starting text (<b>, <i>, <u>), and ending text (</b>, </i>, </u>) then would process all together. This should take one pass through preg_match_all and would be a simplified version of the phpBB 3.0 processor.
Another option with this type of bbcode would be a singular bbcode, such as [br] that does not require a matching pair.
Complex find and replace (regex/tokens)
This class of bbcodes should allow tokens for validating data and re-displaying it to the user, such as for the quote bbcode, while allowing users to do their own validation using regular expressions. Each token should be replaced by a regular expression to validate its data, then made availible for use to the administrator as in phpBB 3.0. These tokens should be customisable, and admins should be able to add additional tokens for patterns they may often use.
Since curly brackets are used already in regex, I propose the posix character class bracket syntax be used as an alternative. It is already used in some flavors of regex, and allows for simple extentions such as [
Because of the complexities of this syntax, which actually does not need to resemble bbcode syntax, each bbcode would need a separate pass in the bbcode processor.
Programatic
These bbcodes would be implemented completely by php classes, and would need to be uploaded by the user. Each would act on a specific interface, providing a regular expression for the starting and ending tags and a method to do text operations. This would allow the bbcode parser to validate start/end tag pairing, and only pass the text of matching pairs to the manipulator method. This type of bbcode might be used for the code tag, or for embedding a wiki markup engine.
Smilies?
Smilies are very similar to bbcodes, except they do not obey the standard bbcode syntax and are strictly find and replace, as a single element bbcode [br] might be. It would make sense to process them in the same way as bbcodes. However, they are separate enough that I didn't include them in my 3 types of bbcodes.
Performance
Performance is a huge consideration with bbcodes. At the moment, phpBB uses a bbcode_uid string to identify bbcodes with matching pairs, and only replace them rather than all instances. This uid is added to all bbcode tags at post time, and is stored embedded in the post text in the database. One problem with this is that the UID is embedded inside the brackets of the bbcode tag itself. With the bbcodes above, there is no guarantee that bbcodes actually use standard bbcode syntax. Thus, it may make more sense to actually contatenate the uid to the bbcode start and end tags, which can also be appended in their matching regex. While this doesn't produce as pretty a database markup, it should produce consistent results.