UTF-8 native support?

Discuss features as they are added to the new version. Give us your feedback. Don't post bug reports, feature requests, support questions or suggestions here.
Forum rules
Discuss features as they are added to the new version. Give us your feedback. Don't post bug reports, feature requests, support questions or suggestions here. Feature requests are closed.
svyatozar
Registered User
Posts: 3
Joined: Thu Feb 17, 2005 5:04 pm

UTF-8 native support?

Post by svyatozar »

That feature would enable creation of truly multilingual boards. For that to work all language files will have to be translated into utf-8 encoding too. Sorry if this looks like a feature request - no. Since search is disabled for me, I just want to know: has this feature been discussed already?

Cap'n Refsmmat
Registered User
Posts: 219
Joined: Tue Jan 25, 2005 11:31 pm

Re: UTF-8 native support?

Post by Cap'n Refsmmat »

viewtopic.php?t=12397&start=0" target="_blank

You can set Google to search only one domain, so that's what I did here.

svyatozar
Registered User
Posts: 3
Joined: Thu Feb 17, 2005 5:04 pm

Re: UTF-8 native support?

Post by svyatozar »

Thanks a lot for the google hint, I really didn't know that works.

So, the last topic I found here is quite obsolete:
viewtopic.php?t=12397&start=0" target="_blank
there is not much after the year 2003 there...

User avatar
A_Jelly_Doughnut
Registered User
Posts: 1780
Joined: Wed Jun 04, 2003 4:23 pm

Re: UTF-8 native support?

Post by A_Jelly_Doughnut »

I don't think that topic is obsolete. No new real versions have been released since then, and MySQL and PHP still have the limits psoTFX described there.
A_Jelly_Doughnut

svyatozar
Registered User
Posts: 3
Joined: Thu Feb 17, 2005 5:04 pm

Re: UTF-8 native support?

Post by svyatozar »

Greetings!

Dear A_Jelly_Doughnut,
Let me put it this way: as far as importance of topic lengths is concerned, that's a one man's opinion. In fact, in some cases topics in native languages do get automatically converted to #nnnn form anyway. From my, user's point of view, this is ugly, and makes long topics cut. But that still is not a complete disaster...

Database department:
Both leading databases Mysql 4.1 and Postgres 8 natively support UTF-8 now. If some other database does not support UTF-8, why should all suffer?

And as far as php is concerned don't use strlen on UTF-8, try mb_strlen instead.
Here is a full list of multy byte string functions:
http://ca3.php.net/manual/en/ref.mbstring.php" target="_blank

In fact I have seen mod development in that direction for phpbb2.0.x. I have also set up a UTF-8 phpbb22 on my site. It seems working as it is... I haven't stress-tested it, I agree, but for the sake of innovation and general pride for "having something no one else has" that definetely does the trick... :lol:

In any case, languages will have to be translated into UTF-8, at least some uniting effort has to be done - I will participate myself - And that's a big one. Leave the rest to modders - I'm not asking the developers to modify the almost finished product...

Don't work too hard! Leave time for family and computer games ;)

Best regards,

Svyatozar

hayk
Registered User
Posts: 10
Joined: Thu Jun 19, 2003 3:30 pm

Re: UTF-8 native support?

Post by hayk »

Look at this multilingual board. It's the phpBB package which i've modified for unicode/utf-8 support.
This modification of the original phpBB (at this moment - version 2.0.11) almost completely (within some of the limitations of PHP/MySQL/PostgresSQL) supports unicode/utf-8.
Հայկ

pichirichi
Registered User
Posts: 3
Joined: Thu Feb 10, 2005 4:18 pm

Re: UTF-8 native support?

Post by pichirichi »

hayk wrote: Look at
this

multilingual board. It's the phpBB package which i've modified for unicode/utf-8 support.
This modification of the original phpBB (at this moment - version 2.0.11) almost completely (within some of the limitations of PHP/MySQL/PostgresSQL) supports unicode/utf-8.
I'm currently investigating utf-8, you can read about some of my findings here.

hayk
Registered User
Posts: 10
Joined: Thu Jun 19, 2003 3:30 pm

Re: UTF-8 native support?

Post by hayk »

pichirichi wrote:
hayk wrote: Look at
this

multilingual board. It's the phpBB package which i've modified for unicode/utf-8 support.
This modification of the original phpBB (at this moment - version 2.0.11) almost completely (within some of the limitations of PHP/MySQL/PostgresSQL) supports unicode/utf-8.
I'm currently investigating utf-8, you can read about some of my findings
here

.
Have you looked my sources?
Հայկ

pichirichi
Registered User
Posts: 3
Joined: Thu Feb 10, 2005 4:18 pm

Re: UTF-8 native support?

Post by pichirichi »

[quote="hayk]
Have you looked my sources?[/quote]


Not yet. I'll do that soon.

birdfoot
Registered User
Posts: 9
Joined: Wed Jun 22, 2005 10:30 pm

Re: UTF-8 native support?

Post by birdfoot »

Hi guys,

Sorry if I sound totally dumb. I don't really know too well about characters encoding. However, wouldn't text stored as &#nnnn; form work?

I'm just curious. I've used vBulletin before and had it's charset on ISO-8859-1. I noticed whenever I create a post using Chinese or Japanese characters, those characters get stored into the DB in &#nnnn; form as compared to how phpBB stores them. I also do not have any language packs installed.

When it comes to display, they work perfectly. They are displayed as what they have been inputted.

The only thing that gave me problems was searching. In vBulletin and by default, searching for those Chinese/Japanese text is not possible. However, there is a way to overcome this by enabling Fulltext search in vBulletin. Some alterations also need to be made to the DB, such as adding indices as well as changing the table types for posts and threads into MyISAM. This solution was actually provided within vBulletin itself.

There are limitations though, like searched text not getting highlighted in the results and sometimes can get wrong results. For the wrong searches, it is due to the searches working in an "OR" manner. i.e. if any of the characters were found (for e.g. 2 <- inputted as &#nnnn; form) then that post will be returned as a result. If you put quotes around the search string then you can get the correct results for what you really wanna look for. Also, I noticed that standard stuff like numbers get changed to &#nnnn; form too (like what I listed in the example above). Of cos, another issue will be requiring more space in DB.

Some other info concerning my text input methods are:
1. I'm using Windows XP Pro with English (US) as the the native OS language setting
2. I use Window XP accompanied IME to enter those text.

Post Reply