PHP Code $UTF8_UPPER_TO_LOWER Questions

Discussion of general topics related to the new version and its place in the world. Don't discuss new features, report bugs, ask for support, et cetera. Don't use this to spam for other boards or attack those boards!
Forum rules
Discussion of general topics related to the new release and its place in the world. Don't discuss new features, report bugs, ask for support, et cetera. Don't use this to spam for other boards or attack those boards!
APTX
Registered User
Posts: 680
Joined: Thu Apr 24, 2003 12:07 pm

Re: PHP Code $UTF8_UPPER_TO_LOWER Questions

Post by APTX »

To save disk space... not that this argument would work here. You usually want to do it the other way around.
Don't give me my freedom out of pity!

code reader
Registered User
Posts: 653
Joined: Wed Sep 21, 2005 3:01 pm

Re: PHP Code $UTF8_UPPER_TO_LOWER Questions

Post by code reader »

DavidMJ wrote: The array_flip gives us nothing but another function call, the ten or so lines don't really take up that much space and it is very unlikely we will be changing either array. Why compute something constant when you can precompute it?
it seems i am being very argumentative today, and somehow half my arguments are with you, david, sorry about it.
when i was a young programmer, some 18-odd years ago, a colleague of mine told me once: a person with two watches never knows what time it is.
the point of this little story is this: generally, it is wrong to maintain in the code two separate tables that contain the same data.
if the tables will have to be changed in the future, it is better to have a single one. if the second one can be computed (rather cheaply) from the first, it is generally preferrable.
it is also an aesthetic flaw to have the same data twice in the code.

the performance cost of compiling two arrays vs. one additional function call is completely insignificant, but i would still be curious to know which of the two cost more (if i would have to guess i would bet compiling is slightly more expensive than flipping. this may change with optimizer)

true, it's a very minor point, but i agree with the op that it is better to avoid those additional 42 lines of code.

User avatar
naderman
Consultant
Posts: 1727
Joined: Sun Jan 11, 2004 2:11 am
Location: Berlin, Germany
Contact:

Re: PHP Code $UTF8_UPPER_TO_LOWER Questions

Post by naderman »

There is a very simple reason why this doesn't work. lower case ß maps to upper case SS, but upper case SS does not map to lower case ß. (if this conversion is in there at all, haven't checked)

User avatar
Acyd Burn
Posts: 1838
Joined: Tue Oct 08, 2002 5:18 pm
Location: Behind You
Contact:

Re: PHP Code $UTF8_UPPER_TO_LOWER Questions

Post by Acyd Burn »

Indeed. There are several reasons why it is how it is currently. 1) For performance reasons - the added size (one array) is neglicable in favor of the increased speed we get by just checking with isset() - the string functions need to be as performant as possible. 2) As nils said, there are (and might be) characters not able to be matched one-by-one in a key/value comparison; for example ß -> SS and SS -> ss. 3) Due to number (2) we do not mistakingly introduce wrong mappings *if* the arrays get changed at all.

Image

bad-dj
Posts: 173
Joined: Sat Aug 26, 2006 11:15 am
Location: Australia
Contact:

Re: PHP Code $UTF8_UPPER_TO_LOWER Questions

Post by bad-dj »

Can some one tell me what the word UTF8 means some one i will not know if some one do not tell me :roll:

agent00shoe

Re: PHP Code $UTF8_UPPER_TO_LOWER Questions

Post by agent00shoe »

bad-dj wrote: Can some one tell me what the word UTF8 means some one i will not know if some one do not tell me :roll:

I think it was a Weird Al movie in the 80s.

User avatar
jojobarjo32
Registered User
Posts: 164
Joined: Wed Jun 22, 2005 7:38 pm
Location: France

Re: PHP Code $UTF8_UPPER_TO_LOWER Questions

Post by jojobarjo32 »

bad-dj wrote: Can some one tell me what the word UTF8 means some one i will not know if some one do not tell me :roll:

http://en.wikipedia.org/wiki/UTF-8 :roll:
You could search yourself... ;)

code reader
Registered User
Posts: 653
Joined: Wed Sep 21, 2005 3:01 pm

Re: PHP Code $UTF8_UPPER_TO_LOWER Questions

Post by code reader »

naderman wrote: There is a very simple reason why this doesn't work. lower case ß maps to upper case SS, but upper case SS does not map to lower case ß. (if this conversion is in there at all, haven't checked)

you are absolutely right.
the funny thing though, is that array_flip(upper-to-lower) === lower-to-upper. (i checked).
i guess that is a mistake of some kind, and one or both of these arrays is not what it should be.

User avatar
naderman
Consultant
Posts: 1727
Joined: Sun Jan 11, 2004 2:11 am
Location: Berlin, Germany
Contact:

Re: PHP Code $UTF8_UPPER_TO_LOWER Questions

Post by naderman »

code reader wrote:
naderman wrote: There is a very simple reason why this doesn't work. lower case ß maps to upper case SS, but upper case SS does not map to lower case ß. (if this conversion is in there at all, haven't checked)

you are absolutely right.
the funny thing though, is that array_flip(upper-to-lower) === lower-to-upper. (i checked).
i guess that is a mistake of some kind, and one or both of these arrays is not what it should be.

No, it wasn't a mistake. It appears that the current array was a simple byte character mapping (one byte character to one byte character) which is supposed to be used for a simple UTF-8 strtolower/upper implementation. So the arrays are in fact equal. We will also implement Unicode case folding which includes mappings as the one I described above. Nonetheless having both arrays is faster so it will stay as it is.

User avatar
VxJasonxV
Registered User
Posts: 341
Joined: Sun Mar 02, 2003 2:51 pm
Location: Castle Rock, CO
Contact:

Re: PHP Code $UTF8_UPPER_TO_LOWER Questions

Post by VxJasonxV »

agent00shoe wrote:
bad-dj wrote: Can some one tell me what the word UTF8 means some one i will not know if some one do not tell me :roll:
I think it was a Weird Al movie in the 80s.
I'm fairly sure you weren't being serious, but that was UHF, you're referencing.
"If You Support It, They Will Come."
"Construction"

Post Reply