What is it exactly?

Discussion of general topics related to the new version and its place in the world. Don't discuss new features, report bugs, ask for support, et cetera. Don't use this to spam for other boards or attack those boards!
Forum rules
Discussion of general topics related to the new release and its place in the world. Don't discuss new features, report bugs, ask for support, et cetera. Don't use this to spam for other boards or attack those boards!
muhaidib
Registered User
Posts: 57
Joined: Tue Jun 28, 2005 7:33 am

What is it exactly?

Post by muhaidib »

Hey there, I keep hearing about UTF-8 all around since Beta3 has been released...

I am in the web design side, so I wouldn't know much about the coding side,,, could someone please explain what exactly is UTF-8 and how is it beneficial to phpBB3.

Thank you

User avatar
Thatbitextra
Registered User
Posts: 72
Joined: Sun May 22, 2005 3:06 am
Location: A place where something is or could be located; a site.
Contact:

Re: What is it exactly?

Post by Thatbitextra »

Style: subBlack (Now updated to phpBB 2.0.22 and 5 new color schemes!)

User avatar
Eelke
Registered User
Posts: 606
Joined: Thu Dec 20, 2001 8:00 am
Location: Bussum, NL
Contact:

Re: What is it exactly?

Post by Eelke »

Basically, all characters that are used to make up the HTML you, as a designer, create, has to be represented in zeroes and ones (computers are digital, remember? ;)). Of course, for everyone to understand what you mean when you send them your ones and zeroes (you may not realise it, but that's what you have been doing whenever you send someone a computer file), there needs to be an agreement of what sequence of ones and zeroes represents which character. This agreement is called an encoding.

Probably the best known encoding, and one of the oldest - I think people didn't even call this an encoding back then - , is ASCII (American Standard Code for Information Interchange), and you don't get much more basic then that. ASCII just describes character-to-binary mappings for upper and lower case a-z, numbers 0-9, some control characters (newline, carriage return, etc.) and a handful special characters (dots, commas, exclamation and question mark, tilde, etc.). You may have seen an ASCII table, where each character is mapped to a number. These numbers represent the ones and zeroes. They are usually decimal (e.g., A is represented by value 65), but that's just because decimal numbers are shorter to write; converting them to binary numbers - ones and zeroes - is pretty trivial if you know how (and often, that's handled at a lower level than a programmer will have to bother with, so it will be the computer taking care of the conversion).

In the years following ASCII, people found the need to use more characters than ASCII described (even many western languages that use ASCII characters, put accents on these characters - ë for example - which ASCII didn't provide), and various extensions were devised. Nowadays, a very common encoding on the web for western language sites is ISO-8859-1. However, we haven't even mentioned languages that use completely different characters, such as Japanese and Chinese; they had (and still have) their own encodings to represent their characters in sequences of ones and zeroes. phpBB used to use these older encodings, because especially when 2.0 was created it was still pretty much standard. Translators and operators of sites targetting audiences that used languages that don't use the ISO-8859-1 encoding, that phpBB 2.0 uses by default, had to fool around with changing the encoding.

UTF-8 is basically also "just" an encoding, but it is very flexible, as that wikipedia article will tell you; you won't need any other encoding to put characters of any language on your site. This also means that no one will every have to fool around with the encoding their site uses, because UTF-8 can handle any language you can throw at it. The web has been shifting more and more towards UTF-8 and for internationally oriented sites it is quickly becoming a necessity.

wintermute
Registered User
Posts: 53
Joined: Fri Sep 03, 2004 11:58 pm
Location: Istanbul

Re: What is it exactly?

Post by wintermute »

Eelke, many thanks for the nice explanation.
Greetings to everyone...

muhaidib
Registered User
Posts: 57
Joined: Tue Jun 28, 2005 7:33 am

Re: What is it exactly?

Post by muhaidib »

ohhh i see,, thanks for the information :)

I just went and check on my browser, the encoding is on UTF-8
Attachments
Picture 1.png
(137.91 KiB) Downloaded 642 times

muhaidib
Registered User
Posts: 57
Joined: Tue Jun 28, 2005 7:33 am

Re: What is it exactly?

Post by muhaidib »

Does vBulletin support UFT-8??

Because I was checking the different encoding on different sites, most sites use UTF-8 but when i checked an arabic forum that was powered by vBulletin the encoding was set automatically on "Arabic (Windows-1256)", when I switched it to UTF-8 it was all question marks (exept the english letters, they showed up normally), cool, phpBB is better LOL

User avatar
Eelke
Registered User
Posts: 606
Joined: Thu Dec 20, 2001 8:00 am
Location: Bussum, NL
Contact:

Re: What is it exactly?

Post by Eelke »

I couldn't find any "conclusive evidence" ;), but I did find some indications that vBulletin does support UTF-8. Maybe that site you found is using an old version of vBulletin, or vBulletin allows you to use any encoding you wish to remain compatible with older translation files.

Anyway, phpBB is better anyway, so it doesn't change a thing ;)

DanoruX
Registered User
Posts: 156
Joined: Fri Mar 18, 2005 11:47 pm
Contact:

Re: What is it exactly?

Post by DanoruX »

Can someone explain whats done different in terms of PHP code when programming something to handle UTF-8?

User avatar
Spiros-
Registered User
Posts: 29
Joined: Sun Sep 17, 2006 9:52 am

Re: What is it exactly?

Post by Spiros- »

DanoruX wrote: Can someone explain whats done different in terms of PHP code when programming something to handle UTF-8?


Sending a different header to the browser and using a different encoding in the database, these are the differences I am aware of.

User avatar
DavidMJ
Registered User
Posts: 932
Joined: Thu Jun 16, 2005 1:14 am
Location: Great Neck, NY

Re: What is it exactly?

Post by DavidMJ »

Spiros- wrote:
DanoruX wrote: Can someone explain whats done different in terms of PHP code when programming something to handle UTF-8?


Sending a different header to the browser and using a different encoding in the database, these are the differences I am aware of.

It goes quite far beyond that. We must reimplement all the string functions because UTF-8 is multibyte and PHP's string functions only work on individual bytes. There are also issues on handling upper and lower case, we must write functions that do this for us as PHP's functions don't handle all the characters (hardly any at all). We also have to do Unicode normalization and case folding to ensure that user input is good. All in PHP. As far as UTF-8 implementations go, phpBB is far away from any competitors. I doubt nobody else has a normalizer built into their product ;)

We give all these features in pure PHP, we require no extensions.
Freedom from fear

Post Reply