[RFC] Human Readable URLs

Note: We are moving the topics of this forum and it will be deleted at some point

Publish your own request for comments/change or patches for the next version of phpBB. Discuss the contributions and proposals of others. Upcoming releases are 3.2/Rhea and 3.3.
Locked
User avatar
Ger
Registered User
Posts: 270
Joined: Mon Jul 26, 2010 1:55 pm
Location: 192.168.1.100
Contact:

Re: [RFC] Human Readable URLs

Post by Ger » Mon Aug 01, 2011 1:20 pm

Isn't this all coming down to adding information to the URL? As Bantu point out, the forum/topic/post id will still be required in the URL. Adding some titles to it is always possible, but how that's done is pure cosmetics. There's little difference between

Code: Select all

http://area51.phpbb.com/phpBB/viewtopic.php?title=rfc-human-readable-urls&f=108&t=40965
http://area51.phpbb.com/phpBB/3-2-arsia--rfcs---patches/rfc-human-readable-urls/f108/t40965
http://area51.phpbb.com/phpBB/108_3-2-arsia--rfcs---patches/40965_rfc-human-readable-urls
etc.
Adding this information surely gives the human reader more information at what can be expected behind the link, so I guess it would make the link more readible in a way. However, even with the shortest option the link would be presented as something like this on many external sites (phpBB powered or otherwise, since many systems use shorteners):
http://www.example.com/phpBB/108_3-2-ar ... dable-urls

Therefore, only internal links would really benifit from this I guess.
Above message may contain errors in grammar, spelling or wrongly chosen words. This is because I'm not a native speaker. My apologies in advance.

User avatar
naderman
Product Manager
Product Manager
Posts: 1727
Joined: Sun Jan 11, 2004 2:11 am
Location: Karlsruhe, Germany
Contact:

Re: [RFC] Human Readable URLs

Post by naderman » Fri Aug 05, 2011 12:52 am

bantu wrote:
naderman wrote:Instead you can keep a lookup table which contains multiple entries per topic if a topic has been renamed.
While I do not like the inclusion of t1234 and f123 and friends in the URL, I think that this approach does not work either. It works for CMSes where you can return an error when an URL is already taken and where only a small number of people manage the actual content behind the URLs. But in phpBB different topics can be generated by different people and topics can generally have the same titles, so without the IDs you have nothing that makes the URL unique enough.
You can easily just add a number to the title. So if you come across the same title, it defaults to whatever-the-title-is-2.

ecwpa
Registered User
Posts: 181
Joined: Mon Jan 24, 2005 2:10 am
Contact:

Re: [RFC] Human Readable URLs

Post by ecwpa » Sun Aug 07, 2011 2:03 am

naderman wrote:You can easily just add a number to the title. So if you come across the same title, it defaults to whatever-the-title-is-2.
That's too much extra work for no reason whatsoever.
Slightly better English than it was in 2005, still improving :D

User avatar
AmigoJack
Registered User
Posts: 92
Joined: Wed May 04, 2011 7:47 pm
Location: グリーン ヒル ゾーン
Contact:

Re: [RFC] Pretty URLs

Post by AmigoJack » Sun Aug 07, 2011 11:27 pm

Sam wrote:Cleaning the topic title would basically strip all odd characters, punctuation, replace whitespace with a single hyphen ( - ), and possibly run a UTF8 strtolower() on the slug as well.
While lowering latin letters is a trivial task, it is dozen times harder on the whole unicode range (which will still grow in the future). I wouldn't lowercase it at all - what would be the benefit of it anyway if it's only used for readability (it's not used as index anyway)?


Define:
  1. odd characters (do you mean those with tremas, accents, macrons, ogoneks, rings... in short: diacritics? Stripping those mostly voids the meaning of the word entirely)
  2. punctuation (do you mean those occuring in the ASCII range or all, including mathematical symbols? Beware of topic titles which carry version or product model numbers, which can easily have punctuation)
Also you haven't said something about characters which have their own meaning in a URL, like : / # % ? (unless those are the ones you named as odd or punctuation).


URLs should not carry human readable information at all, that's why link titles exist. Even W3 can't say it more clear: cool URIs don't change. Also nowadays most people don't know how to properly link (to a forum, to a topic or to a post - not to speak of anchors at all). Links with text in them instead of distinctive IDs would lead to even more confusion IMO.

User avatar
Dragosvr92
Registered User
Posts: 624
Joined: Tue May 31, 2011 12:08 pm
Location: Romania
Contact:

Re: [RFC] Human Readable URLs

Post by Dragosvr92 » Mon Aug 08, 2011 5:10 am

I was wondering when phpBB will make avaible some pretty urls.

I think i prefer:

Code: Select all

/f12-forum-name/                    Forum ID 12
/f12-forum-name/t4-my-topic/        Topic id 4

Or.......

Code: Select all

/f12-forum-name/                    Forum ID 12
/f12-t4-p5-forum-name/my-topic/        Topic id 4
Instead of:

Code: Select all

/forum-name-f12/                    Forum ID 12
/forum-name-f12/my-topic-t4/        Topic id 4
Previous user: TheKiller
Avatar on Memberlist 1.0.3

User avatar
naderman
Product Manager
Product Manager
Posts: 1727
Joined: Sun Jan 11, 2004 2:11 am
Location: Karlsruhe, Germany
Contact:

Re: [RFC] Human Readable URLs

Post by naderman » Mon Aug 08, 2011 5:26 pm

ecwpa wrote:
naderman wrote:You can easily just add a number to the title. So if you come across the same title, it defaults to whatever-the-title-is-2.
That's too much extra work for no reason whatsoever.
This would obviously be automated. So there is no extra work at all.

ecwpa
Registered User
Posts: 181
Joined: Mon Jan 24, 2005 2:10 am
Contact:

Re: [RFC] Human Readable URLs

Post by ecwpa » Tue Aug 16, 2011 5:51 pm

TheKiller wrote: Instead of:

Code: Select all

/forum-name-f12/                    Forum ID 12
/forum-name-f12/my-topic-t4/        Topic id 4
He did it that way because most search engines only index the first 25 characters of the url, so its important to give priority to the titles.
Slightly better English than it was in 2005, still improving :D

User avatar
Dragosvr92
Registered User
Posts: 624
Joined: Tue May 31, 2011 12:08 pm
Location: Romania
Contact:

Re: [RFC] Human Readable URLs

Post by Dragosvr92 » Tue Aug 16, 2011 6:09 pm

umm.. allright then. but then i think they should look like this:

Code: Select all

/forum-name-f12/                    Forum ID 12
/forum-name/my-topic-f12-t4-p5/        Topic id 4
....... Whatever. I think that the ids should be very close, like there.
I like them to be the first tho.
Previous user: TheKiller
Avatar on Memberlist 1.0.3

ecwpa
Registered User
Posts: 181
Joined: Mon Jan 24, 2005 2:10 am
Contact:

Re: [RFC] Human Readable URLs

Post by ecwpa » Tue Aug 16, 2011 9:58 pm

I agree, although, it would be inconsistent.

My take:

Code: Select all

/F12/forum_name/                    Forum ID 12
/F12/T4/P5/forum_name/my_topic/        Topic id 4
I do care about SEO but I also enjoy nice URLs, thats why I would like to use underscores, it's easier to read to the user. I know people use underscore for functions but I don't manage any programming forum. All IDs to the left, the rest is optional.
Slightly better English than it was in 2005, still improving :D

User avatar
sooskriszta
Registered User
Posts: 85
Joined: Wed Dec 29, 2010 7:23 pm

I am changing the post title

Post by sooskriszta » Fri Aug 26, 2011 4:00 pm

I recommend this schema

Code: Select all

home - domain.com/forums/
board - domain.com/forums/forum-title
topic - domain.com/forums/topic/topic-title
topic - domain.com/forums/topic/topic-title/page
post - domain.com/forums/post/post-title
SERP - domain.com/forums/search/keywords
Examples

Code: Select all

phpbb.com/community/
phpbb.com/community/3-2-arsia-rfcs-patches
phpbb.com/community/topic/rfc-human-readable-urls
phpbb.com/community/topic/rfc-human-readable-urls/2
phpbb.com/community/post/i-am-changing-the-post-title
phpbb.com/community/search/seo-urls
While Sam's original suggestion is very good, the above model solves 2 problems:
  • URLs don't break/change when a post is moved from one forum/board to another.
  • URLs are shorter, and thereby (presumptively) more user friendly.
  • These become, in effect, canonical URLs
I can almost hear a suggestion that the ID be included in the URL, but I believe that would reduce some of the value of the human readable URL...in that while the URL will be intelligible, it would not be much more replicable than current ugly URLs. That being said, while I prefer keeping IDs out of the URL, if I were told that the only way to get SEO URLs was to have the ID included in them, I would take it.

Don't blast me for taking a cue from Wordpress
http://codex.wordpress.org/Using_Permalinks
Image

but what I suggest is that when encountered with duplicate URLs, phpbb add a counter to the end on the title-based URL, e.g.

Code: Select all

phpbb.com/community/topic/rfc-human-readable-urls-2
This has something of a similar negative effect as including the ID in the URL, without any of the performance positives. So why do I prefer this over IDs?
  • In using IDs, ALL URLs will have IDs. In using counters for duplicates, fewer URLs will have these numbers
  • There is an unexpected benefit of counters - duplicate content becomes somewhat easier to spot (if same words are used in title)...it doesn't help with intentional abuse, but if as a user I am creating a topic [RFC] Human Readable URLs and I see that the URL created is

    Code: Select all

    phpbb.com/community/topic/rfc-human-readable-urls-2
    , then I am more likely to check out

    Code: Select all

    phpbb.com/community/topic/rfc-human-readable-urls
    even if I am prone to not searching before posting. It is even truer of people reading the post...if I were to come across a post

    Code: Select all

    phpbb.com/community/topic/rfc-human-readable-urls-2
    then I would likely check out

    Code: Select all

    phpbb.com/community/topic/rfc-human-readable-urls
    to see if the discussions are different...and so duplicate topics would, many times, be organically discarded....
Admittedly, this applies to topics, not posts and there would be a HUGE number of posts with similar URLs....but I think that is not a problem because the URLs that we are most concerned with are usually topic URLs; post URLs are RELATIVELY rarely shared/distributed/clicked directly.

When a topic is split, both one should continue to have the same URL, while the other should have the additional incremental counter.
Splitting

Code: Select all

phpbb.com/community/topic/rfc-human-readable-urls
should create

Code: Select all

phpbb.com/community/topic/rfc-human-readable-urls
phpbb.com/community/topic/rfc-human-readable-urls-2
phpbb.com/community/topic/rfc-human-readable-urls-3
etc

When 2 topics are merged, the admin should get an option to choose which URL shall be used in linking to the new topic, but both URLs should point to the new topic, i.e.
If admin merges

Code: Select all

phpbb.com/community/topic/rfc-human-readable-urls
and

Code: Select all

phpbb.com/community/topic/rfc-seo-urls
and says the url for merged topic should be

Code: Select all

phpbb.com/community/topic/rfc-human-readable-urls
then all parts of the site linking to the topic shall use the url

Code: Select all

phpbb.com/community/topic/rfc-human-readable-urls
but if someone types (or clicks in an old email)

Code: Select all

phpbb.com/community/topic/rfc-seo-urls
then that should go to the topic as well

If the title of the topic is changed, the url should remain the same as it was before changing the title. If the title of a topic is changed by admin, then the admin should have the option to manually rewrite the title part of the url. If the url thus entered by admin already exists (is duplicate) then the above defined method for duplicate urls applies, and a counter is quietly added by the system to the url.
If admin changes name of a topic from
[RFC] Pretty URLs
(which has URL

Code: Select all

phpbb.com/community/topic/rfc-pretty-urls
)
to
[RFC] Human Readable URLs
then URL of the post remains

Code: Select all

phpbb.com/community/topic/rfc-pretty-urls
but admin has the option to manually edit the rfc-pretty-urls part
If admin changes rfc-pretty-urls to rfc-human-readable-urls

Code: Select all

phpbb.com/community/topic/rfc-human-readable-urls
and

Code: Select all

phpbb.com/community/topic/rfc-human-readable-urls
already exists then automatically the URL should change from

Code: Select all

phpbb.com/community/topic/rfc-pretty-urls
to

Code: Select all

phpbb.com/community/topic/rfc-human-readable-urls-2
Cleaning the URLs:
  • unicode characters should not be messed with. All major browsers understand them and are able to use them. It's good if there are Russian or Hungarian or German or Greek or Hindi characters in the URL (here I depart from Wordpress...which forces smaller set ASCII for latin script languages...so, for instance ä in title becomes a in URL...I don't agree with this...ä should remain ä)
  • <space>, _, &, /, \, |, *, +, =, (, ), {, }, [, ], <, >, !, ?,@, ", #, should each be converted to - (dash)
  • I don't see any burning need to downcase letters....this was a security concern that some had a couple of years ago (URLs can be spoofed as capital lower case L looks like uppercase I etc. and so can, theoretically, be used for phishing), but it's more relevant to domain names, rather than a bb software URLs
  • I haven't made up my mind about whether or not to remove common words like "the", "or", "a", "an", "and", "to", etc....on one hand removing these would increase keyword density, while on the other hand the list of words may need to be maintained separately for each language...
OC2PS
Testfestés, Arcfestés, Csillámfestés

Alapanyagok, Képzések, Ismertetők
Hennafestés
GMAT coaching and MBA Admissions Consulting
formerly known as sooskriszta

Locked