Background
MySQL's "default" UTF8 character set only supports 3 byte characters which prevents the use of some characters such as http://en.wikipedia.org/wiki/Emoji. A 4 byte utf8 character set utf8mb4 has been introduced with MySQL 5.5.
Proposal
I propose upgrading all utf8 column character sets to utf8mb4 and requiring MySQL 5.5.3 in order to have support for utf8mb4 without having to support utf8 and utf8mb4 at the same time.
Links
http://dev.mysql.com/doc/refman/5.5/en/ ... f8mb4.html
http://tracker.phpbb.com/browse/PHPBB3-11711
Use utf8mb4 and require MySQL 5.5.3 / MariaDB 5.5
Re: Use utf8mb4 and require MySQL 5.5.3 / MariaDB 5.5
Could you post some usage stats please?
Formerly known as Unknown Bliss
No unsolicited PMs please except for quotes.psoTFX wrote: I went with Olympus because as I said to the teams ... "It's been one hell of a hill to climb"
- bantu
- 3.0 Release Manager
- Posts: 557
- Joined: Thu Sep 07, 2006 11:22 am
- Location: Karlsruhe, Germany
- Contact:
Re: Use utf8mb4 and require MySQL 5.5.3 / MariaDB 5.5
MySQL 5.0 is no longer maintained and MySQL 5.1 will probably be no longer maintained when phpBB 3.2 gets released, so this should be fine to do for phpBB 3.2 or 3.3.
Re: Use utf8mb4 and require MySQL 5.5.3 / MariaDB 5.5
I agree that we should eventually make that move, and that we shouldn't try to do it depending on MySQL version but only change it once we are confident we don't need to support lower MySQL versions anymore. So I would say we should look at this again in about half a year and make a decision for 3.2 then.
- bantu
- 3.0 Release Manager
- Posts: 557
- Joined: Thu Sep 07, 2006 11:22 am
- Location: Karlsruhe, Germany
- Contact:
Re: Use utf8mb4 and require MySQL 5.5.3 / MariaDB 5.5
For future reference:
An INDEX on an utf8 varchar(255) column has size 3*255 = 765 bytes.
An INDEX on an utf8mb4 varchar(255) column will likely have size 4*255 = 1020 bytes.
The maximum key size for a column is 767 bytes by default.
As a result, varchar(255) columns may have to be reduced to varchar(190).
An INDEX on an utf8 varchar(255) column has size 3*255 = 765 bytes.
An INDEX on an utf8mb4 varchar(255) column will likely have size 4*255 = 1020 bytes.
The maximum key size for a column is 767 bytes by default.
As a result, varchar(255) columns may have to be reduced to varchar(190).
Re: Use utf8mb4 and require MySQL 5.5.3 / MariaDB 5.5
-1
I'm generally not in favor of using utf-8 fields in indexes. it's bad db design. for a longer explanation, see this.
I'm generally not in favor of using utf-8 fields in indexes. it's bad db design. for a longer explanation, see this.
Re: Use utf8mb4 and require MySQL 5.5.3 / MariaDB 5.5
1)sajaki wrote:-1
I'm generally not in favor of using utf-8 fields in indexes. it's bad db design. for a longer explanation, see this.
I guess it's a good reason for UTF-8 to have a piece of BB-software that can be used not only in countries with latine-based characters - f.e. arabic and japanese where named in the blog-post you linkedhttp://www.adayinthelifeof.nl/2010/12/04/about-using-utf-8-fields-in-mysql wrote:Conclusion:
MySQL and it’s internal working can be insanely complex. It’s important to never assume anything and test everything. Don’t convert everything to UTF-8 just because.. but make sure you have good reasons NOT to use a single-byte encoding like latin1. If you need to use the UTF-8 encoding, then make sure that you use the correct sizes. Don’t make everything VARCHAR(255) so at least you can store really long names. The penalties for “disrespecting” the database can and will be severe..
2)As far as I understand it's not about changing all db-fields to utf8mb4 - only existing utf8 column to utf8mb4 to really get all possible characters ...
btw the problem about 4 bytes is also named in your blog-post you linked
so this RFC solves a bit that was denounced there.http://www.adayinthelifeof.nl/2010/12/04/about-using-utf-8-fields-in-mysql wrote:You just have to realize that MySQL only uses a maximum of 3 bytes for UTF-8, which means not ALL utf-8 characters can be stored in MySQL, but most of the UTF-8 characters possible aren’t used anyway.. That’s why it might get confusing when reading upon UTF-8 that uses 4 bytes, and the 3 bytes that MySQL uses.
- Elsensee
- Former Team Member
- Posts: 42
- Joined: Sun Mar 16, 2014 1:08 pm
- Location: Hamburg, Germany
- Contact:
Re: Use utf8mb4 and require MySQL 5.5.3 / MariaDB 5.5
The problem is solved by blocking Emojis at all which is not ok I think. It's the wrong approach. So are there any updates for this? (Of course not for Ascraeus, but for Arsia.. or Rhea.. however you wanna call it.. it should be possible, right?
)
- bantu
- 3.0 Release Manager
- Posts: 557
- Joined: Thu Sep 07, 2006 11:22 am
- Location: Karlsruhe, Germany
- Contact:
Re: Use utf8mb4 and require MySQL 5.5.3 / MariaDB 5.5
This RFC describes a proper solution for the problem at hand and should still be implemented. Possibly for phpBB 3.2 or whenever it is deemed okay to require MySQL 5.5.3 and someone actually implements it.