UTF-8 native support?
Forum rules
Discuss features as they are added to the new version. Give us your feedback. Don't post bug reports, feature requests, support questions or suggestions here. Feature requests are closed.
Discuss features as they are added to the new version. Give us your feedback. Don't post bug reports, feature requests, support questions or suggestions here. Feature requests are closed.
UTF-8 native support?
That feature would enable creation of truly multilingual boards. For that to work all language files will have to be translated into utf-8 encoding too. Sorry if this looks like a feature request - no. Since search is disabled for me, I just want to know: has this feature been discussed already?
-
- Registered User
- Posts: 219
- Joined: Tue Jan 25, 2005 11:31 pm
Re: UTF-8 native support?
viewtopic.php?t=12397&start=0" target="_blank
You can set Google to search only one domain, so that's what I did here.
You can set Google to search only one domain, so that's what I did here.
Re: UTF-8 native support?
Thanks a lot for the google hint, I really didn't know that works.
So, the last topic I found here is quite obsolete:
viewtopic.php?t=12397&start=0" target="_blank
there is not much after the year 2003 there...
So, the last topic I found here is quite obsolete:
viewtopic.php?t=12397&start=0" target="_blank
there is not much after the year 2003 there...
- A_Jelly_Doughnut
- Registered User
- Posts: 1780
- Joined: Wed Jun 04, 2003 4:23 pm
Re: UTF-8 native support?
I don't think that topic is obsolete. No new real versions have been released since then, and MySQL and PHP still have the limits psoTFX described there.
A_Jelly_Doughnut
Re: UTF-8 native support?
Greetings!
Dear A_Jelly_Doughnut,
Let me put it this way: as far as importance of topic lengths is concerned, that's a one man's opinion. In fact, in some cases topics in native languages do get automatically converted to #nnnn form anyway. From my, user's point of view, this is ugly, and makes long topics cut. But that still is not a complete disaster...
Database department:
Both leading databases Mysql 4.1 and Postgres 8 natively support UTF-8 now. If some other database does not support UTF-8, why should all suffer?
And as far as php is concerned don't use strlen on UTF-8, try mb_strlen instead.
Here is a full list of multy byte string functions:
http://ca3.php.net/manual/en/ref.mbstring.php" target="_blank
In fact I have seen mod development in that direction for phpbb2.0.x. I have also set up a UTF-8 phpbb22 on my site. It seems working as it is... I haven't stress-tested it, I agree, but for the sake of innovation and general pride for "having something no one else has" that definetely does the trick...
In any case, languages will have to be translated into UTF-8, at least some uniting effort has to be done - I will participate myself - And that's a big one. Leave the rest to modders - I'm not asking the developers to modify the almost finished product...
Don't work too hard! Leave time for family and computer games
Best regards,
Svyatozar
Dear A_Jelly_Doughnut,
Let me put it this way: as far as importance of topic lengths is concerned, that's a one man's opinion. In fact, in some cases topics in native languages do get automatically converted to #nnnn form anyway. From my, user's point of view, this is ugly, and makes long topics cut. But that still is not a complete disaster...
Database department:
Both leading databases Mysql 4.1 and Postgres 8 natively support UTF-8 now. If some other database does not support UTF-8, why should all suffer?
And as far as php is concerned don't use strlen on UTF-8, try mb_strlen instead.
Here is a full list of multy byte string functions:
http://ca3.php.net/manual/en/ref.mbstring.php" target="_blank
In fact I have seen mod development in that direction for phpbb2.0.x. I have also set up a UTF-8 phpbb22 on my site. It seems working as it is... I haven't stress-tested it, I agree, but for the sake of innovation and general pride for "having something no one else has" that definetely does the trick...
In any case, languages will have to be translated into UTF-8, at least some uniting effort has to be done - I will participate myself - And that's a big one. Leave the rest to modders - I'm not asking the developers to modify the almost finished product...
Don't work too hard! Leave time for family and computer games
Best regards,
Svyatozar
Re: UTF-8 native support?
Look at this multilingual board. It's the phpBB package which i've modified for unicode/utf-8 support.
This modification of the original phpBB (at this moment - version 2.0.11) almost completely (within some of the limitations of PHP/MySQL/PostgresSQL) supports unicode/utf-8.
This modification of the original phpBB (at this moment - version 2.0.11) almost completely (within some of the limitations of PHP/MySQL/PostgresSQL) supports unicode/utf-8.
Հայկ
-
- Registered User
- Posts: 3
- Joined: Thu Feb 10, 2005 4:18 pm
Re: UTF-8 native support?
I'm currently investigating utf-8, you can read about some of my findings here.hayk wrote: Look at
this
multilingual board. It's the phpBB package which i've modified for unicode/utf-8 support.
This modification of the original phpBB (at this moment - version 2.0.11) almost completely (within some of the limitations of PHP/MySQL/PostgresSQL) supports unicode/utf-8.
Re: UTF-8 native support?
Have you looked my sources?pichirichi wrote:I'm currently investigating utf-8, you can read about some of my findingshayk wrote: Look at
this
multilingual board. It's the phpBB package which i've modified for unicode/utf-8 support.
This modification of the original phpBB (at this moment - version 2.0.11) almost completely (within some of the limitations of PHP/MySQL/PostgresSQL) supports unicode/utf-8.
here
.
Հայկ
-
- Registered User
- Posts: 3
- Joined: Thu Feb 10, 2005 4:18 pm
Re: UTF-8 native support?
[quote="hayk]
Have you looked my sources?[/quote]
Not yet. I'll do that soon.
Have you looked my sources?[/quote]
Not yet. I'll do that soon.
Re: UTF-8 native support?
Hi guys,
Sorry if I sound totally dumb. I don't really know too well about characters encoding. However, wouldn't text stored as &#nnnn; form work?
I'm just curious. I've used vBulletin before and had it's charset on ISO-8859-1. I noticed whenever I create a post using Chinese or Japanese characters, those characters get stored into the DB in &#nnnn; form as compared to how phpBB stores them. I also do not have any language packs installed.
When it comes to display, they work perfectly. They are displayed as what they have been inputted.
The only thing that gave me problems was searching. In vBulletin and by default, searching for those Chinese/Japanese text is not possible. However, there is a way to overcome this by enabling Fulltext search in vBulletin. Some alterations also need to be made to the DB, such as adding indices as well as changing the table types for posts and threads into MyISAM. This solution was actually provided within vBulletin itself.
There are limitations though, like searched text not getting highlighted in the results and sometimes can get wrong results. For the wrong searches, it is due to the searches working in an "OR" manner. i.e. if any of the characters were found (for e.g. 2 <- inputted as &#nnnn; form) then that post will be returned as a result. If you put quotes around the search string then you can get the correct results for what you really wanna look for. Also, I noticed that standard stuff like numbers get changed to &#nnnn; form too (like what I listed in the example above). Of cos, another issue will be requiring more space in DB.
Some other info concerning my text input methods are:
1. I'm using Windows XP Pro with English (US) as the the native OS language setting
2. I use Window XP accompanied IME to enter those text.
Sorry if I sound totally dumb. I don't really know too well about characters encoding. However, wouldn't text stored as &#nnnn; form work?
I'm just curious. I've used vBulletin before and had it's charset on ISO-8859-1. I noticed whenever I create a post using Chinese or Japanese characters, those characters get stored into the DB in &#nnnn; form as compared to how phpBB stores them. I also do not have any language packs installed.
When it comes to display, they work perfectly. They are displayed as what they have been inputted.
The only thing that gave me problems was searching. In vBulletin and by default, searching for those Chinese/Japanese text is not possible. However, there is a way to overcome this by enabling Fulltext search in vBulletin. Some alterations also need to be made to the DB, such as adding indices as well as changing the table types for posts and threads into MyISAM. This solution was actually provided within vBulletin itself.
There are limitations though, like searched text not getting highlighted in the results and sometimes can get wrong results. For the wrong searches, it is due to the searches working in an "OR" manner. i.e. if any of the characters were found (for e.g. 2 <- inputted as &#nnnn; form) then that post will be returned as a result. If you put quotes around the search string then you can get the correct results for what you really wanna look for. Also, I noticed that standard stuff like numbers get changed to &#nnnn; form too (like what I listed in the example above). Of cos, another issue will be requiring more space in DB.
Some other info concerning my text input methods are:
1. I'm using Windows XP Pro with English (US) as the the native OS language setting
2. I use Window XP accompanied IME to enter those text.