PHPBB3-13933 - Update tokens' definitions in acp_bbcodes

Discuss requests for comments/changes posted in the Issue Tracker for the development of phpBB. Current releases are 3.2/Rhea and 3.3/Proteus.
Post Reply
User avatar
JoshyPHP
Registered User
Posts: 381
Joined: Fri Jul 08, 2011 9:43 pm

PHPBB3-13933 - Update tokens' definitions in acp_bbcodes

Post by JoshyPHP »

There are some differences between the BBCode tokens from 3.1 and those supported by s9e\TextFormatter in 3.2.

The {URL} token is different from 3.1. Its definition is:
3.1 wrote:A valid URL using any protocol (http, ftp, etc… cannot be used for javascript exploits). If none is given, “http://” is prefixed to the string.
In s9e\TextFormatter, protocols have to be whitelisted so the part about using "any protocol" is a lie. I'd like to add an interface for the admin to configure which protocols are allowed but adding anything to the ACP takes so much code. :? I'm thinking about adding the list of allowed protocols as a text input in acp_board with a default value of http,https,ftp.

What's this about adding "http://"? It's not currently in 3.2. Is it worth adding now or should it be scrapped altogether?

There's a bunch of new tokens in 3.2. I think that trying to list them all in acp_bbcodes would confuse users more than it would help them though.

New phpBB-specific tokens that were added to support the default BBCodes
  • FLASHHEIGHT accepts a number between 0 and max_img_height.
  • FLASHWIDTH accepts a number between 0 and max_img_width.
  • FONTSIZE accepts a number between 1 and max_font_size.
  • IMAGEURL accepts a URL and checks it against max_img_height/max_img_width.
Filters from s9e\TextFormatter that are automatically supported as tokens in 3.2
  • ALNUM accepts anything that matches /^[0-9A-Za-z]+$/D.
  • HASHMAP maps strings to their replacement in the form {HASHMAP=string1:replacement1,string2:replacement2}. Case-sensitive. Preserves unknown values by default. It's used for the list BBCode to map 1 to decimal and so on, so forth.
  • IP, IPPORT, IPV4 and IPV6 validates their value using ext/filter's FILTER_VALIDATE_IP.
  • INT validates its value using ext/filter's FILTER_VALIDATE_INT.
  • MAP is similar to HASHMAP except that it is case-insensitive by default.[/c]
  • RANGE accepts an integer between given range, e.g. {RANGE=-10,42}
  • REGEXP validates its value against given regexp, e.g. {REGEXP=/^foo\w+bar$/}
  • UINT validates its value using ext/filter's FILTER_VALIDATE_INT, must be 0 or greater.
Some related docs from s9e\TextFormatter: Same doc, better UI:
Last edited by JoshyPHP on Mon Jun 08, 2015 6:08 pm, edited 2 times in total.

User avatar
AmigoJack
Registered User
Posts: 110
Joined: Wed May 04, 2011 7:47 pm
Location: グリーン ヒル ゾーン
Contact:

Re: PHPBB3-13933 - Update tokens' definitions in acp_bbcodes

Post by AmigoJack »

First of all: how on earth does the string

_http://_

(yes - all of that is needed: the protocol, colon, two slashes and enclosing non-word characters) alone trigger make_clickable to partially convert it into a link? That's clearly a bug on whatever version is used right here.


JoshyPHP wrote:
3.1 wrote:A valid URL using any protocol (http, ftp, etc… cannot be used for javascript exploits). If none is given, “http://” is prefixed to the string.
Then also one should change protocol into scheme (unless majority's understanding outweights correctness).


JoshyPHP wrote:a default value of http,https,ftp
What about about, cvs, file, git, irc, ldap, magnet, mailto, mms, news, rtmp, sftp, skype, ssh, svn, torrent and udp, which are widely used aswell?


JoshyPHP wrote:Is it worth adding now or should it be scrapped altogether?
If it's not added then URIs start to get relative, and the scheme of the source is used. Here, every external URI without scheme would turn into https: and nobody knows if the target actually supports this.


JoshyPHP wrote:
  • FLASHHEIGHT accepts a number between 0 and max_img_height.
  • FLASHWIDTH accepts a number between 0 and max_img_width.
Is it really bound to Flash only - can't it be more generic? I'm asking because Flash is more or less a hype and once it is not used by most people anymore these tokens would dwell in darkness because of their names, although being applicable to other things aswell. Why not i.e. OBJECTHEIGHT?


JoshyPHP wrote:
  • FONTSIZE accepts a number between 1 and max_font_size.
I lost overview on what's currently used and supported in the future, but please keep in mind that this approach is outdated, since it doesn't honor a measurement. phpBB 3.0 interpreted it as percentage, other BBSes interpret it as pt. Also what you formulated does not tell anything about decimals: can I use 3.5 or not? A versatile and future-proof approach should be to also allow measurements, i.e. 17px, 150%, 2em.



In case you're interested: I also needed to render support for a CSS token:

Code: Select all

'CSS'=> array( '!([a-zA-Z0-9-+.,_ :;#%]+)!'=> '$1' );
So I could add custom BBCodes that were a lot more customizable than using BBCodes themselves. As you see parenthesis are not allowed, so something like url(http://badsite.ext/exec.php) is not possible. It also disallows any other function, sadly.

User avatar
JoshyPHP
Registered User
Posts: 381
Joined: Fri Jul 08, 2011 9:43 pm

Re: PHPBB3-13933 - Update tokens' definitions in acp_bbcodes

Post by JoshyPHP »

About the incomplete URL, it's just the way URLs are matched and trailing punctuation is removed. I fixed it in the library.
AmigoJack wrote: What about about, cvs, file, git, irc, ldap, magnet, mailto, mms, news, rtmp, sftp, skype, ssh, svn, torrent and udp, which are widely used aswell?
I have a related PR that makes it configurable.
AmigoJack wrote:Is it really bound to Flash only
Yes, because that's a token used for backward compatibility with the Flash-only settings. I don't expect anyone to use it or even suspect its existence.

Same for {FONTSIZE}, it exists for backward compability. I'm considering adding some kind of {CSSSIZE} token/filter in the library but I haven't gotten to it.
AmigoJack wrote: In case you're interested: I also needed to render support for a CSS token:

Code: Select all

'CSS'=> array( '!([a-zA-Z0-9-+.,_ :;#%]+)!'=> '$1' );
You can use the regexp directly in the definition. Here's an example of a definition and two functionally equivalent replacements:

Code: Select all

[div style={REGEXP=/^[a-zA-Z0-9-+.,_ :;#%]+$/}]{TEXT}[/div]

Code: Select all

<div style="{REGEXP}">{TEXT}</div>

Code: Select all

<div style="{@style}"><xsl:apply-templates/></div>
An extension could define its own tokens. Grab the configurator instance during the core.text_formatter_s9e_configure_before event and create an attribute filter. For example, if I create a filter named #foo, I can use {FOO} tokens in BBCode definitions and their value will go through given callback. See http://s9etextformatter.readthedocs.org ... m_filters/

Code: Select all

$configurator->attributeFilters->add('#foo', 'mycallback');

function mycallback($attrValue)
{
	return ($attrValue === 'bad') ? false : $attrValue;
}
Theorically you could create a #css filter and plug some kind of CSS parser/validator/purifier to allow a safe subset of CSS in {CSS} tokens.

User avatar
Oyabun1
Former Team Member
Posts: 20
Joined: Thu Mar 31, 2011 9:48 am

Re: PHPBB3-13933 - Update tokens' definitions in acp_bbcodes

Post by Oyabun1 »

JoshyPHP wrote: Mon Jun 08, 2015 6:08 pm There's a bunch of new tokens in 3.2. I think that trying to list them all in acp_bbcodes would confuse users more than it would help them though.
If they aren't listed how does an admin know they exist or when to use them?

User avatar
AmigoJack
Registered User
Posts: 110
Joined: Wed May 04, 2011 7:47 pm
Location: グリーン ヒル ゾーン
Contact:

Re: PHPBB3-13933 - Update tokens' definitions in acp_bbcodes

Post by AmigoJack »

JoshyPHP wrote: Mon Jun 08, 2015 6:08 pmYou can use the regexp directly in the definition. Here's an example of a definition and two functionally equivalent replacements:

Code: Select all

[div style={REGEXP=/^[a-zA-Z0-9-+.,_ :;#%]+$/}]{TEXT}[/div]

Code: Select all

<div style="{REGEXP}">{TEXT}</div>
Looks great for flexibility. And I hope it's easy to also allow subpatterns. I.e.:

Code: Select all

[div param={REGEXP=/^([0-9]{4})-([0-9]{2})-([0-9]{2})$/}]{TEXT}[/div]

Code: Select all

<div param="{REGEXP.$2}/{REGEXP.$3}/{REGEXP.$1}" title="{REGEXP.$0}">{TEXT}</div>

User avatar
JoshyPHP
Registered User
Posts: 381
Joined: Fri Jul 08, 2011 9:43 pm

Re: PHPBB3-13933 - Update tokens' definitions in acp_bbcodes

Post by JoshyPHP »

Oyabun1 wrote: Sun Nov 01, 2015 11:48 am If they aren't listed how does an admin know they exist or when to use them?
Assuming this isn't rhetorical, the answer is they don't know they exist. Someone would need to add a description somewhere, or link to the description from the first post.
AmigoJack wrote: Sun Nov 01, 2015 9:19 pm I hope it's easy to also allow subpatterns. I.e.:

Code: Select all

[div param={REGEXP=/^([0-9]{4})-([0-9]{2})-([0-9]{2})$/}]{TEXT}[/div]

Code: Select all

<div param="{REGEXP.$2}/{REGEXP.$3}/{REGEXP.$1}" title="{REGEXP.$0}">{TEXT}</div>
You can use this syntax:

Code: Select all

[div param={PARSE=/^(?'y'[0-9]{4})-(?'m'[0-9]{2})-(?'d'[0-9]{2})$/}]{TEXT}[/div]

Code: Select all

<div param="{@m}/{@d}/{@y}" title="{@param}">{TEXT}</div>

Post Reply