username_clean must throw any marks

General discussion of development ideas and the approaches taken in the 3.x branch of phpBB. The current feature release of phpBB 3 is 3.3/Proteus.
Forum rules
Please do not post support questions regarding installing, updating, or upgrading phpBB 3.3.x. If you need support for phpBB 3.3.x please visit the 3.3.x Support Forum on phpbb.com.

If you have questions regarding writing extensions please post in Extension Writers Discussion to receive proper guidance from our staff and community.
Post Reply
hubaishan
Registered User
Posts: 16
Joined: Fri Oct 21, 2011 11:29 am

username_clean must throw any marks

Post by hubaishan »

I noticed that username_clean field in users table store the letter marks as it, This is wrong behavior, Because it enables duplicate usernames that only differ in marks.

In my language Arabic any letter can take one or tow marks, so if username (محمد) is registered, another users can register with the same name by adding any marks to name (محمد), Arabic user may do not take attention that a mark is added, this will alow impersonating.

for example all these words are pronounced same of (محمد) and have mark addition:
مُحمد
محَمد
محمّد
محمدُ
مُحَمد
....
all of these usernames can be registered in the same forum as different users.


The same thing for Kashida (also called Tatweel) (ـ), it is unprounced character and it is not real letter, it can be added between letters as (مـحمد) or (محـمد), This character take the same effect of marks, and must be cleaned.

also Hamza forms (ء , ئ , ؤ) must be replaced with the one form (ء) because all these forms are one letter, Arabic user may type word (شؤون) as (شئون) also (رءوف) may typed (رؤوف).
The same thing with Alef forms (ا أ إ آ) must replaced with one form (ا).

This is for Arabic language and do not have knowledge with marks in other languages.

I suggest that function utf8_clean_string() must modified to handle these changes or may be added confusables.php file witch is a hard to modify, and not written by "phpBB Coding Guidelines".

User avatar
DavidIQ
Customisations Team Leader
Customisations Team Leader
Posts: 1904
Joined: Thu Mar 02, 2006 4:29 pm
Location: Earth
Contact:

Re: username_clean must throw any marks

Post by DavidIQ »

Isn't this the same as me saying that DavidIQ and DavidlQ (has an "L") are the same so therefore it shouldn't be allowed? Or DavidIQ and David.IQ are the same because the extra punctuation shouldn't make them different? If so then that doesn't sound like a correct assertion to me. And what is confusables.php?
Image

hubaishan
Registered User
Posts: 16
Joined: Fri Oct 21, 2011 11:29 am

Re: username_clean must throw any marks

Post by hubaishan »

DavidIQ wrote: Sat Mar 19, 2016 6:45 pm the extra punctuation shouldn't make them different?
I agree that all punctuation must be cleaned.
DavidIQ wrote: Sat Mar 19, 2016 6:45 pm And what is confusables.php?
"includes/utf/data/confusables.php" (58,190 bytes in only single line!!)
this file returbs array contains some letters witch replaced with other by function utf8_clean_string() (in file includes/utf/utf_tools.php) witch used to generate username_clean field in users table.

User avatar
JoshyPHP
Registered User
Posts: 381
Joined: Fri Jul 08, 2011 9:43 pm

Re: username_clean must throw any marks

Post by JoshyPHP »

confusables.php is generated automatically by develop/generate_utf_confusables.php using http://unicode.org/reports/tr39/data/confusables.txt and http://unicode.org/Public/UNIDATA/CaseFolding.txt

None of those files should be edited by hand.

Post Reply