I noticed that username_clean field in users table store the letter marks as it, This is wrong behavior, Because it enables duplicate usernames that only differ in marks.
In my language Arabic any letter can take one or tow marks, so if username (محمد) is registered, another users can register with the same name by adding any marks to name (محمد), Arabic user may do not take attention that a mark is added, this will alow impersonating.
for example all these words are pronounced same of (محمد) and have mark addition:
مُحمد
محَمد
محمّد
محمدُ
مُحَمد
....
all of these usernames can be registered in the same forum as different users.
The same thing for Kashida (also called Tatweel) (ـ), it is unprounced character and it is not real letter, it can be added between letters as (مـحمد) or (محـمد), This character take the same effect of marks, and must be cleaned.
also Hamza forms (ء , ئ , ؤ) must be replaced with the one form (ء) because all these forms are one letter, Arabic user may type word (شؤون) as (شئون) also (رءوف) may typed (رؤوف).
The same thing with Alef forms (ا أ إ آ) must replaced with one form (ا).
This is for Arabic language and do not have knowledge with marks in other languages.
I suggest that function utf8_clean_string() must modified to handle these changes or may be added confusables.php file witch is a hard to modify, and not written by "phpBB Coding Guidelines".
username_clean must throw any marks
Forum rules
Please do not post support questions regarding installing, updating, or upgrading phpBB 3.3.x. If you need support for phpBB 3.3.x please visit the 3.3.x Support Forum on phpbb.com.
If you have questions regarding writing extensions please post in Extension Writers Discussion to receive proper guidance from our staff and community.
Please do not post support questions regarding installing, updating, or upgrading phpBB 3.3.x. If you need support for phpBB 3.3.x please visit the 3.3.x Support Forum on phpbb.com.
If you have questions regarding writing extensions please post in Extension Writers Discussion to receive proper guidance from our staff and community.
- DavidIQ
- Customisations Team Leader
- Posts: 1904
- Joined: Thu Mar 02, 2006 4:29 pm
- Location: Earth
- Contact:
Re: username_clean must throw any marks
Isn't this the same as me saying that DavidIQ and DavidlQ (has an "L") are the same so therefore it shouldn't be allowed? Or DavidIQ and David.IQ are the same because the extra punctuation shouldn't make them different? If so then that doesn't sound like a correct assertion to me. And what is confusables.php?
Re: username_clean must throw any marks
I agree that all punctuation must be cleaned.
"includes/utf/data/confusables.php" (58,190 bytes in only single line!!)
this file returbs array contains some letters witch replaced with other by function utf8_clean_string() (in file includes/utf/utf_tools.php) witch used to generate username_clean field in users table.
Re: username_clean must throw any marks
confusables.php is generated automatically by develop/generate_utf_confusables.php using http://unicode.org/reports/tr39/data/confusables.txt and http://unicode.org/Public/UNIDATA/CaseFolding.txt
None of those files should be edited by hand.
None of those files should be edited by hand.