Use utf8mb4 and require MySQL 5.5.3 / MariaDB 5.5

Note: We are moving the topics of this forum and it will be deleted at some point

Publish your own request for comments/change or patches for the next version of phpBB. Discuss the contributions and proposals of others. Upcoming releases are 3.2/Rhea and 3.3.
Post Reply
User avatar
bantu
3.0 Release Manager
3.0 Release Manager
Posts: 557
Joined: Thu Sep 07, 2006 11:22 am
Location: Karlsruhe, Germany
Contact:

Use utf8mb4 and require MySQL 5.5.3 / MariaDB 5.5

Post by bantu »

Background
MySQL's "default" UTF8 character set only supports 3 byte characters which prevents the use of some characters such as http://en.wikipedia.org/wiki/Emoji. A 4 byte utf8 character set utf8mb4 has been introduced with MySQL 5.5.

Proposal
I propose upgrading all utf8 column character sets to utf8mb4 and requiring MySQL 5.5.3 in order to have support for utf8mb4 without having to support utf8 and utf8mb4 at the same time.

Links
http://dev.mysql.com/doc/refman/5.5/en/ ... f8mb4.html
http://tracker.phpbb.com/browse/PHPBB3-11711

User avatar
MichaelC
Development Team
Development Team
Posts: 889
Joined: Thu Jan 28, 2010 6:29 pm

Re: Use utf8mb4 and require MySQL 5.5.3 / MariaDB 5.5

Post by MichaelC »

Could you post some usage stats please?
Formerly known as Unknown Bliss
psoTFX wrote: I went with Olympus because as I said to the teams ... "It's been one hell of a hill to climb"
No unsolicited PMs please except for quotes.

User avatar
bantu
3.0 Release Manager
3.0 Release Manager
Posts: 557
Joined: Thu Sep 07, 2006 11:22 am
Location: Karlsruhe, Germany
Contact:

Re: Use utf8mb4 and require MySQL 5.5.3 / MariaDB 5.5

Post by bantu »

MySQL 5.0 is no longer maintained and MySQL 5.1 will probably be no longer maintained when phpBB 3.2 gets released, so this should be fine to do for phpBB 3.2 or 3.3.

User avatar
naderman
Consultant
Posts: 1727
Joined: Sun Jan 11, 2004 2:11 am
Location: Berlin, Germany
Contact:

Re: Use utf8mb4 and require MySQL 5.5.3 / MariaDB 5.5

Post by naderman »

I agree that we should eventually make that move, and that we shouldn't try to do it depending on MySQL version but only change it once we are confident we don't need to support lower MySQL versions anymore. So I would say we should look at this again in about half a year and make a decision for 3.2 then.

User avatar
bantu
3.0 Release Manager
3.0 Release Manager
Posts: 557
Joined: Thu Sep 07, 2006 11:22 am
Location: Karlsruhe, Germany
Contact:

Re: Use utf8mb4 and require MySQL 5.5.3 / MariaDB 5.5

Post by bantu »

For future reference:
An INDEX on an utf8 varchar(255) column has size 3*255 = 765 bytes.
An INDEX on an utf8mb4 varchar(255) column will likely have size 4*255 = 1020 bytes.
The maximum key size for a column is 767 bytes by default.
As a result, varchar(255) columns may have to be reduced to varchar(190).

sajaki
Registered User
Posts: 86
Joined: Mon Jun 21, 2010 8:28 pm

Re: Use utf8mb4 and require MySQL 5.5.3 / MariaDB 5.5

Post by sajaki »

-1

I'm generally not in favor of using utf-8 fields in indexes. it's bad db design. for a longer explanation, see this.

User avatar
Un1matr1x
Registered User
Posts: 48
Joined: Mon Sep 07, 2009 10:18 pm

Re: Use utf8mb4 and require MySQL 5.5.3 / MariaDB 5.5

Post by Un1matr1x »

sajaki wrote:-1

I'm generally not in favor of using utf-8 fields in indexes. it's bad db design. for a longer explanation, see this.
1)
http://www.adayinthelifeof.nl/2010/12/04/about-using-utf-8-fields-in-mysql wrote:Conclusion:
MySQL and it’s internal working can be insanely complex. It’s important to never assume anything and test everything. Don’t convert everything to UTF-8 just because.. but make sure you have good reasons NOT to use a single-byte encoding like latin1. If you need to use the UTF-8 encoding, then make sure that you use the correct sizes. Don’t make everything VARCHAR(255) so at least you can store really long names. The penalties for “disrespecting” the database can and will be severe.. :)
I guess it's a good reason for UTF-8 to have a piece of BB-software that can be used not only in countries with latine-based characters - f.e. arabic and japanese where named in the blog-post you linked

2)As far as I understand it's not about changing all db-fields to utf8mb4 - only existing utf8 column to utf8mb4 to really get all possible characters ...


btw the problem about 4 bytes is also named in your blog-post you linked
http://www.adayinthelifeof.nl/2010/12/04/about-using-utf-8-fields-in-mysql wrote:You just have to realize that MySQL only uses a maximum of 3 bytes for UTF-8, which means not ALL utf-8 characters can be stored in MySQL, but most of the UTF-8 characters possible aren’t used anyway.. That’s why it might get confusing when reading upon UTF-8 that uses 4 bytes, and the 3 bytes that MySQL uses.
so this RFC solves a bit that was denounced there.

User avatar
Elsensee
Former Team Member
Posts: 42
Joined: Sun Mar 16, 2014 1:08 pm
Location: Hamburg, Germany
Contact:

Re: Use utf8mb4 and require MySQL 5.5.3 / MariaDB 5.5

Post by Elsensee »

The problem is solved by blocking Emojis at all which is not ok I think. It's the wrong approach. So are there any updates for this? (Of course not for Ascraeus, but for Arsia.. or Rhea.. however you wanna call it.. it should be possible, right? :D )

User avatar
bantu
3.0 Release Manager
3.0 Release Manager
Posts: 557
Joined: Thu Sep 07, 2006 11:22 am
Location: Karlsruhe, Germany
Contact:

Re: Use utf8mb4 and require MySQL 5.5.3 / MariaDB 5.5

Post by bantu »

This RFC describes a proper solution for the problem at hand and should still be implemented. Possibly for phpBB 3.2 or whenever it is deemed okay to require MySQL 5.5.3 and someone actually implements it.

Post Reply