unicode support

Discussion of general topics related to the new version and its place in the world. Don't discuss new features, report bugs, ask for support, et cetera. Don't use this to spam for other boards or attack those boards!
Forum rules
Discussion of general topics related to the new release and its place in the world. Don't discuss new features, report bugs, ask for support, et cetera. Don't use this to spam for other boards or attack those boards!
Augustin
Registered User
Posts: 19
Joined: Mon Feb 23, 2004 3:48 am

unicode support

Post by Augustin »

I remember seing that there would be unicode support in the road map. I cannot see it anymore.

Is this planned? Implemented?

How do we deal with multi-languages forum?

When is php2.2 scheduled for? I have read the FAQ, but give me an idea: are we years away, months away, weeks or only days away from the release date?
http://www.reuniting.info" target="_blank Healing with sexual relationships

Alternative Voting (Condorcet, Approval Voting) phpBB Mod at
http://www.masquilier.org" target="_blank (currently unmaintained)
User avatar
SHS`
Registered User
Posts: 1628
Joined: Wed Jul 04, 2001 9:13 am
Location: The Boonies, Hong Kong
Contact:

Re: unicode support

Post by SHS` »

Augustin wrote:I remember seing that there would be unicode support in the road map. I cannot see it anymore.

Is this planned? Implemented?

How do we deal with multi-languages forum?
phpBB2.2's can be multi-lingual now... example:
WikiPedia wrote:Wikipedias with over 10,000 articles:
Dansk (Danish) – Deutsch (German) – Esperanto – Español (Spanish) – Français (French) – Italiano (Italian) – 日本語 (Japanese) – Nederlands (Dutch) – Polska (Polish) – Português (Portuguese) – Svenska (Swedish) – 简体中文 (Chinese, simplified) – 繁體中文 (Chinese, traditional)

Wikipedias with over 1000 articles:
Afrikaans – العربية (Arabic) – Asturianu (Asturian) – Български (Bulgarian) – Català (Catalan) – Česká (Czech) – Cymraeg (Welsh) – Ελληνικά (Greek) – Eesti (Estonian) – Euskara (Basque) – Suomeksi (Finnish) – Frysk (Western Frisian) – Gallego (Galician) – עברית (Hebrew) – Hrvatski (Croatian) – Magyar (Hungarian) – Interlingua – Bahasa Indonesia (Indonesian) – Íslenska (Icelandic) – 한국어 (Korean) – Latina (Latin) – Bahasa Melayu (Malay) – Norsk (Norwegian) – Română (Romanian) – Русский (Russian) – Simple English – Sloven¨čina (Slovenian) – Српски (Serbian) – Türkçe (Turkish) – Українська (Ukrainian) – Walon (Walloon)

Wikipedias with over 100 articles:
Беларуская (Belarusian) – Bosanski (Bosnian) – Kaszëbsczi (Kashubian) – فارسی (Persian) – Gaeilge (Irish) – हिन्दी (Hindi) – Ido – Bahasa Jawa (Javanese) – Kurdî (Kurdish) – Lëtzebuergesch (Luxembourgish) – Lietuvių (Lithuanian) – Latvie¨u (Latvian) – Hō-ló-oē (Southern Min) – Plattdüütsch (Low Saxon) – Langue d'Oc (Occitan) – संस्कृत (Sanskrit) – Slovenčina (Slovak) – Basa Sunda (Sundanese) – தமிழ் (Tamil) – ไทย (Thai) – toki pona – Tatarça (Tatar) – اردو (Urdu) – Tiếng Việt (Vietnamese)

Wikipedias with under 100 articles:
Elsässisch (Alsatian) – Aragonés (Aragonese) – Azərbaycan (Azeri) – Bislama – বাংলা (Bengali) – Brezhoneg (Breton) – ᏣᎳᎩ (Cherokee) – Corsu (Corsican) – Føroyskt (Faroese) – Gàidhlig (Scottish Gaelic) – Guarani – Gujarati – Lojban – ქართული (Georgian) – ភាសាខ្មែរ (Khmer) – кыргызча (Kyrgyz) – Malagasy – Māori – Македонски (Macedonian) – Malayalam – Монгол (Mongolian) – मराठी (Marathi) – Nauri (Nauruan) – Nahuatl – Diné Bizaad (Navajo) – ਪੰਜਾਬੀ / پنجابی (Punjabi) – Armâneashti (Aromanian) – Sardu (Sardinian) – Srpskohrvatski (Serbo-Croatian) – Shqip (Albanian) – Kiswahili (Swahili) – Тоҷикӣ (Tajik) – Tagalog – tlhIngan Hol (Klingon) – Tok Pisin – Volapük – ייִדיש (Yiddish)
... it's just that characters outside the used charset need to be entitised, something which you don't need to do when using Unicode...

However, the lack of support for Unicode within PHP itself, even with multibyte enabled, means that string manipulating functions "break".

I can't recall what implimentations the devs have tried in getting decent Unicode support to fix such deficiencies, but have decided they impact on performance too much.

I have seen this, which effectively rewrites the affected functions so that they become Unicode compatible: http://www.randomchaos.com/document.php ... nd_unicode" target="_blank

However, I have no idea how scaleable this implimentation is either.

In the end though, I believe the current plan is for language files to be offered in at least two encodings, one "native", and another in Unicode, for those who have a multilingual requirement.
Augustin wrote:When is php2.2 scheduled for? I have read the FAQ, but give me an idea: are we years away, months away, weeks or only days away from the release date?
http://www.phpbb.com/phpBB/viewtopic.ph ... 11#1227611" target="_blank

Done when it's done, though not the former or latter guesses.
Jonathan “SHS`” Stanley • 史德信
phpBB™ 3.1.x, Bug/Security trackers
phpBB™ Bertie Bear 3.0 — prosilver Edition!Asking Questions The Smart Way
Augustin
Registered User
Posts: 19
Joined: Mon Feb 23, 2004 3:48 am

Re: unicode support

Post by Augustin »

SHS` wrote: phpBB2.2's can be multi-lingual now... example:

... it's just that characters outside the used charset need to be entitised, something which you don't need to do when using Unicode...

However, the lack of support for Unicode within PHP itself, even with multibyte enabled, means that string manipulating functions "break".

I can't recall what implimentations the devs have tried in getting decent Unicode support to fix such deficiencies, but have decided they impact on performance too much.

I have seen this, which effectively rewrites the affected functions so that they become Unicode compatible: http://www.randomchaos.com/document.php ... nd_unicode" target="_blank

However, I have no idea how scaleable this implimentation is either.

In the end though, I believe the current plan is for language files to be offered in at least two encodings, one "native", and another in Unicode, for those who have a multilingual requirement.
Augustin wrote:When is php2.2 scheduled for? I have read the FAQ, but give me an idea: are we years away, months away, weeks or only days away from the release date?
http://www.phpbb.com/phpBB/viewtopic.ph ... 11#1227611" target="_blank

Done when it's done, though not the former or latter guesses.
Thank you a lot for all those comments. This is very informative. Thank you for the links too.

Augustin
http://www.reuniting.info" target="_blank Healing with sexual relationships

Alternative Voting (Condorcet, Approval Voting) phpBB Mod at
http://www.masquilier.org" target="_blank (currently unmaintained)
Augustin
Registered User
Posts: 19
Joined: Mon Feb 23, 2004 3:48 am

Re: unicode support

Post by Augustin »

????

é"'(àç_à=
http://www.reuniting.info" target="_blank Healing with sexual relationships

Alternative Voting (Condorcet, Approval Voting) phpBB Mod at
http://www.masquilier.org" target="_blank (currently unmaintained)
Augustin
Registered User
Posts: 19
Joined: Mon Feb 23, 2004 3:48 am

Re: unicode support

Post by Augustin »

How do you do it?

In your quote, there is chinese, but if I input Chinese myself, it comes back as ????
http://www.reuniting.info" target="_blank Healing with sexual relationships

Alternative Voting (Condorcet, Approval Voting) phpBB Mod at
http://www.masquilier.org" target="_blank (currently unmaintained)
User avatar
{o}
Registered User
Posts: 90
Joined: Wed Mar 31, 2004 1:26 pm
Contact:

Re: unicode support

Post by {o} »

合氣道
Works... :roll:
akazik
Registered User
Posts: 2
Joined: Tue Sep 28, 2004 10:37 am

Re: unicode support

Post by akazik »

Hi!

Unicode support is maybe a little bit difficult, but I'ts really cooool.

I've wrote some unicode functions for my own project.
Now they're working quite good (at least fo me).
[but the project is not yet ready]

It can do:
utf8 from/to array of unicode characters (as int) (up to 21bit)
so now each character (as represented as 1-4 bytes of utf8) is now one int.
it's now possible to use the array_* functions for cutting, search, ...

and there is another function which gets utf8 as input and do return:
- utf8 with html chars (<, >, &, ...) replaced by < and so
- ascii (7bit), with replacements (ä => ae, and also some greek, japanese and other replacements)
i do use this for seachring and sorting
- and a few more not so interesing ones

If someone is interested in, I'll send it to you.
And I will publish it on my homepage as soon I've enough time to do so.

Ciao, ALeX.
User avatar
psoTFX
Registered User
Posts: 1984
Joined: Tue Jul 03, 2001 8:50 pm
Contact:

Re: unicode support

Post by psoTFX »

The major problem is with regular expressions ... we can somewhat cope with normal string operations, with appropriate DB support and the appropriate search engine module we can even handle searching. However regexp support is problematical.
mm2
Registered User
Posts: 1
Joined: Thu Jan 06, 2005 11:20 am
Contact:

Re: unicode support

Post by mm2 »

ąčęėį¨ųū¸ Lithuanian Latvian

works
hayk
Registered User
Posts: 10
Joined: Thu Jun 19, 2003 3:30 pm

Re: unicode support

Post by hayk »

psoTFX wrote: The major problem is with regular expressions ... we can somewhat cope with normal string operations, with appropriate DB support and the appropriate search engine module we can even handle searching. However regexp support is problematical.
You can use PRCE with "u" modifier or mb_ereg*() functions from "Multibyte String Functions".
Հայկ
Post Reply