PHPBB3-14985 - Plain text is stored as HTML and not decoded before usage

Discuss requests for comments/changes posted in the Issue Tracker for the development of phpBB. Current releases are 3.2/Rhea and 3.3/Proteus.
Post Reply
User avatar
JoshyPHP
Registered User
Posts: 381
Joined: Fri Jul 08, 2011 9:43 pm

PHPBB3-14985 - Plain text is stored as HTML and not decoded before usage

Post by JoshyPHP »

Some reports about issues with HTML special characters in 3.2 have appeared recently:
The problem is the same for both: admin-submitted text is encoded and stored as HTML, then it is used in contexts where plain text is expected. This is all new to me but judging from the commit log it's been that way since 2011 so I guess I just never noticed it.
If every text field is encoded to HTML for storage then phpbb\textformatter\data_access should probably be modified to automatically decode them to plain text.
Is everything submitted via the ACP stored as HTML? If that's the case then everything should be decoded before usage because everything in the phpbb\textformatter namespace expects to receive the same data the user typed in. Or does that concern only some fields? This area of phpBB is completely foreign to me.

User avatar
JoshyPHP
Registered User
Posts: 381
Joined: Fri Jul 08, 2011 9:43 pm

Re: PHPBB3-14985 - Plain text is stored as HTML and not decoded before usage

Post by JoshyPHP »

Posted a PR, I'd appreciate if someone with a fresh pair of eyes would take a look: https://github.com/phpbb/phpbb/pull/4631

Some test fixtures I created in previous PRs may be incorrect in that their entities may not be encoded. Most of the time I manually copy/paste data from phpMyAdmin into the XML file and I may have forgotten to use CDATA field in some places, meaning that some tests may be run on incorrect data. If that's the case, the kind of bugs it would produce is that something would stop working when it contains HTML special chars like the two topics above.

On second thought, those test fixtures simulate the return values of the data_access\get_* functions, so their content should not be encoded.

Post Reply