Recursive regular expression use - an easier way?

Discussion of general topics related to the new version and its place in the world. Don't discuss new features, report bugs, ask for support, et cetera. Don't use this to spam for other boards or attack those boards!
Forum rules
Discussion of general topics related to the new release and its place in the world. Don't discuss new features, report bugs, ask for support, et cetera. Don't use this to spam for other boards or attack those boards!
Post Reply
coppro
Registered User
Posts: 2
Joined: Thu May 17, 2007 1:43 am

Recursive regular expression use - an easier way?

Post by coppro »

Hi.

I'm new here. I just downloaded phpBB3 Beta 5, and I was looking through bbcode.php and message_parser.php because I'm the kind of guy who wanted to see how it all works.

I got an idea while toying around with the custom BBCodes, in an attempt that ended in me attacking the database directly trying to get a tag. Eventually I figured out that there was no way for me to do what I wanted without modifying the code - and I'd need to modify it again if I wanted it to work on all platforms that didn't have the CSS counter_increment property implemented properly. The idea was to have a type of BBCode that gives you an in-liine element similar to a quote tag with a label such as "Spoiler" that you can then click on to reveal the text. The JavaScript is handled through CSS's display: element. Unfortunately, this means that each one has to have it's own ID. By far the best method to do this is through PHP, although in theory it could be done through a combination of JavaScript and CSS. That's the one modification that could be worked around.

The other modification is the issue of recursivity. Since I want this BBCode to not be hard-coded in (or else the system becomes useless for more advanced codes), I attempted to do this by modifying the regex in the database. Eventually, I found a regex that would work "!\[spoiler\](((?>[^\[]*)|(?R)|\[)*)\[/spoiler\]!is". This regex flawlessly (and relatively quickly) matches the tag that closes the one that is opened. It even can deal with imbalanced tags. I won't get into the details, but it works by calling it self recursively (see the PCRE Syntax documentation). The only downside to this approach is that preg_replace does not operate recursively, and cannot be modified to do so.(a shame, but that's not important) This means that while the bottom-level tag will be parsed correctly, each subsequent level must be parsed with another call to preg_replace. Since that's the best I can do with a regex, I have no choice but to alter the code to do this.

Now, while I was looking over the message parser, the bbcode_firstpass class seemed to me needlessly complicated. Why do you need a function for a tag like when the thing could be parsed by a much more simple and straightforward regex? It then occurred to me that it was designed for a changing BBCode (or at least, that's the only reason I can fathom). However, the bbcode_quote function is worse! it seems designed entirely to parse a quote declaration when a recursive regex would work just as well! Just have PHP compare the result of preg_replace to what it was before each iteration, and if it's different, try again. You can even check against the nesting maximum using this requirement! Is there any reason that this isn't done?

Oh, and one last thing. If I decide to make this tag a mod, I will obviously need to include the code changes required to make it work. But which would be better: packaging it all as one thing, or packaging both the recursive mod and the id incrementation mod separately, and then adding instructions on adding the tag, perhaps as a third mod?

Martin Blank
Registered User
Posts: 687
Joined: Sun May 11, 2003 11:17 am

Re: Recursive regular expression use - an easier way?

Post by Martin Blank »

There are several solutions posted on this very board for spoiler BBCode, from the very simple (black on black text) to complex (hiding and unhiding the whole text block). If it can be done in HTML/CSS, it can probably be done in BBCode.

Perhaps the issue here is a misunderstanding of the custom BBCode functionality?
You can never go home again... but I guess you can shop there.

walkingdead
Registered User
Posts: 8
Joined: Fri May 04, 2007 1:04 pm

Re: Recursive regular expression use - an easier way?

Post by walkingdead »

coppro wrote: Hi.

I'm new here. I just downloaded phpBB3 Beta 5, and I was looking through bbcode.php and message_parser.php because I'm the kind of guy who wanted to see how it all works.


you might also want to try the the lateist cvs which has a lot of the bugs worked out that were in Bata 5

User avatar
DavidMJ
Registered User
Posts: 932
Joined: Thu Jun 16, 2005 1:14 am
Location: Great Neck, NY

Re: Recursive regular expression use - an easier way?

Post by DavidMJ »

I don't have time right now to answer all of this (am doing some non phpBB related stuff right now;)) but will try to answer some things.

If I remember correctly, the phpBB3 system did not use recursive expressions because each recursive expression can only handle something like 15 levels before PCRE must resort to malloc calls (they are not fast). It also might not be able to do the malloc which would cause some of the data to not be parsed.
Freedom from fear

Klors
Registered User
Posts: 95
Joined: Fri Sep 19, 2003 2:08 pm

Re: Recursive regular expression use - an easier way?

Post by Klors »

Besides, why would anyone want a mod when, as has been mentioned, the normal bbcode can handle it just fine? eg.

BBcode usage =

Code: Select all

[spoiler]{TEXT}[/spoiler]
HTML replacement =

Code: Select all

<blockquote><div style="cursor:pointer;cursor:hand;" onclick="if (this.getElementsByTagName('div')[0].style.display != 'block') { this.getElementsByTagName('div')[0].style.display = 'block';} else { this.getElementsByTagName('div')[0].style.display = 'none'; }"><cite>Spoiler: (click to reveal/hide)</cite><div style="display: none;">{TEXT}</div></div></blockquote>

coppro
Registered User
Posts: 2
Joined: Thu May 17, 2007 1:43 am

Re: Recursive regular expression use - an easier way?

Post by coppro »

[quote]

Code: Select all

<blockquote><div style="cursor:pointer;cursor:hand;" onclick="if (this.getElementsByTagName('div')[0].style.display != 'block') { this.getElementsByTagName('div')[0].style.display = 'block';} else { this.getElementsByTagName('div')[0].style.display = 'none'; }"><cite>Spoiler: (click to reveal/hide)</cite><div style="display: none;">{TEXT}</div></div></blockquote>
That's helpful (I wasn't aware of the particular solving method using javascript within an external <div> element instead of requiring each one to be separately numbered), but that doesn't solve the issue of recursiveness. The idea would be that if you were to use

Code: Select all

[spoiler]Maybe you'll want to see:[spoiler]THIS![/spoiler][/spoiler]
, then it wouldn't end up being parsed as

Code: Select all

{OPEN_SPOILER}Maybe you'll want to see:[spoiler]THIS!{CLOSE_SPOILER}[/spoiler]
.

As for the memory limit, if I read the documentation correctly, that only applies with a regex with more than 15 sets of capturing expressions. Quote does not have this. And the ACP page for BBCodes could easily have a warning on it. "Warning: {RECURSE} tokens can cause the BBCode to process slowly in a long tag, especially if it uses a large number of total tokens (at least 4)".

Also, I guess that my question about MODs could apply generally: what is the best way to package a MOD that could be divided up into two smaller MODs?

el noobe
Registered User
Posts: 22
Joined: Thu Jan 05, 2006 11:54 am
Location: Hanover, Germany
Contact:

Re: Recursive regular expression use - an easier way?

Post by el noobe »

Do you mean sth like this:
BBCode usage:

Code: Select all

[spoiler={TITLE}]{TEXT}[/spoiler]
HTML replacement:

Code: Select all

<blockquote><div style="cursor:pointer;cursor:hand;" onclick="if (this.getElementsByTagName('div')[0].style.display != 'block') { this.getElementsByTagName('div')[0].style.display = 'block';} else { this.getElementsByTagName('div')[0].style.display = 'none'; }"><cite>{TITLE}: (click to reveal/hide)</cite><div style="display: none;">{TEXT}</div></div></blockquote>
There is a solution for almost every text formatting problem with Custom BBCodes... ;)
el noobe

Post Reply