phpBB

Development Discussion Board

phpBB's testing ground of bleeding edge code
Advanced search

[RFC] Human Readable URLs

Publish your own request for comments or patches for the next version of phpBB. Discuss the contributions and proposals of others. Upcoming releases are 3.1/Ascraeus and 3.2/Arsia.

Re: [RFC] Human Readable URLs

Postby Ger » Mon Aug 01, 2011 1:20 pm

Isn't this all coming down to adding information to the URL? As Bantu point out, the forum/topic/post id will still be required in the URL. Adding some titles to it is always possible, but how that's done is pure cosmetics. There's little difference between
Code: Select all
http://area51.phpbb.com/phpBB/viewtopic.php?title=rfc-human-readable-urls&f=108&t=40965
http://area51.phpbb.com/phpBB/3-2-arsia--rfcs---patches/rfc-human-readable-urls/f108/t40965
http://area51.phpbb.com/phpBB/108_3-2-arsia--rfcs---patches/40965_rfc-human-readable-urls
etc.
Adding this information surely gives the human reader more information at what can be expected behind the link, so I guess it would make the link more readible in a way. However, even with the shortest option the link would be presented as something like this on many external sites (phpBB powered or otherwise, since many systems use shorteners):
http://www.example.com/phpBB/108_3-2-ar ... dable-urls

Therefore, only internal links would really benifit from this I guess.
Above message may contain errors in grammar, spelling or wrongly chosen words. This is because English isn't my mother tongue. My apologies in advance.
User avatar
Ger
Registered User
 
Posts: 176
Joined: Mon Jul 26, 2010 1:55 pm
Location: 192.168.1.100

Re: [RFC] Human Readable URLs

Postby naderman » Fri Aug 05, 2011 12:52 am

bantu wrote:
naderman wrote:Instead you can keep a lookup table which contains multiple entries per topic if a topic has been renamed.

While I do not like the inclusion of t1234 and f123 and friends in the URL, I think that this approach does not work either. It works for CMSes where you can return an error when an URL is already taken and where only a small number of people manage the actual content behind the URLs. But in phpBB different topics can be generated by different people and topics can generally have the same titles, so without the IDs you have nothing that makes the URL unique enough.

You can easily just add a number to the title. So if you come across the same title, it defaults to whatever-the-title-is-2.
www.naderman.de
Move your forum to Forumatic - we'll take care of maintenance & spam
User avatar
naderman
Development Team Leader
Development Team Leader
 
Posts: 1649
Joined: Sun Jan 11, 2004 2:11 am
Location: Karlsruhe, Germany

Re: [RFC] Human Readable URLs

Postby ecwpa » Sun Aug 07, 2011 2:03 am

naderman wrote:You can easily just add a number to the title. So if you come across the same title, it defaults to whatever-the-title-is-2.


That's too much extra work for no reason whatsoever.
Sorry for my bad english, still improving.
ecwpa
Registered User
 
Posts: 169
Joined: Mon Jan 24, 2005 2:10 am

Re: [RFC] Pretty URLs

Postby AmigoJack » Sun Aug 07, 2011 11:27 pm

Sam wrote:Cleaning the topic title would basically strip all odd characters, punctuation, replace whitespace with a single hyphen ( - ), and possibly run a UTF8 strtolower() on the slug as well.
While lowering latin letters is a trivial task, it is dozen times harder on the whole unicode range (which will still grow in the future). I wouldn't lowercase it at all - what would be the benefit of it anyway if it's only used for readability (it's not used as index anyway)?


Define:
  1. odd characters (do you mean those with tremas, accents, macrons, ogoneks, rings... in short: diacritics? Stripping those mostly voids the meaning of the word entirely)
  2. punctuation (do you mean those occuring in the ASCII range or all, including mathematical symbols? Beware of topic titles which carry version or product model numbers, which can easily have punctuation)
Also you haven't said something about characters which have their own meaning in a URL, like : / # % ? (unless those are the ones you named as odd or punctuation).


URLs should not carry human readable information at all, that's why link titles exist. Even W3 can't say it more clear: cool URIs don't change. Also nowadays most people don't know how to properly link (to a forum, to a topic or to a post - not to speak of anchors at all). Links with text in them instead of distinctive IDs would lead to even more confusion IMO.
User avatar
AmigoJack
Registered User
 
Posts: 59
Joined: Wed May 04, 2011 7:47 pm
Location: グリーン ヒル ゾーン

Re: [RFC] Human Readable URLs

Postby Dragosvr92 » Mon Aug 08, 2011 5:10 am

I was wondering when phpBB will make avaible some pretty urls.

I think i prefer:
Code: Select all
/f12-forum-name/                    Forum ID 12
/f12-forum-name/t4-my-topic/        Topic id 4



Or.......
Code: Select all
/f12-forum-name/                    Forum ID 12
/f12-t4-p5-forum-name/my-topic/        Topic id 4


Instead of:
Code: Select all
/forum-name-f12/                    Forum ID 12
/forum-name-f12/my-topic-t4/        Topic id 4
Previous username: TheKiller
Avatar on Memberlist 1.0.3
User avatar
Dragosvr92
Registered User
 
Posts: 358
Joined: Tue May 31, 2011 12:08 pm
Location: Romania

Re: [RFC] Human Readable URLs

Postby naderman » Mon Aug 08, 2011 5:26 pm

ecwpa wrote:
naderman wrote:You can easily just add a number to the title. So if you come across the same title, it defaults to whatever-the-title-is-2.


That's too much extra work for no reason whatsoever.

This would obviously be automated. So there is no extra work at all.
www.naderman.de
Move your forum to Forumatic - we'll take care of maintenance & spam
User avatar
naderman
Development Team Leader
Development Team Leader
 
Posts: 1649
Joined: Sun Jan 11, 2004 2:11 am
Location: Karlsruhe, Germany

Re: [RFC] Human Readable URLs

Postby ecwpa » Tue Aug 16, 2011 5:51 pm

TheKiller wrote:Instead of:
Code: Select all
/forum-name-f12/                    Forum ID 12
/forum-name-f12/my-topic-t4/        Topic id 4


He did it that way because most search engines only index the first 25 characters of the url, so its important to give priority to the titles.
Sorry for my bad english, still improving.
ecwpa
Registered User
 
Posts: 169
Joined: Mon Jan 24, 2005 2:10 am

Re: [RFC] Human Readable URLs

Postby Dragosvr92 » Tue Aug 16, 2011 6:09 pm

umm.. allright then. but then i think they should look like this:

Code: Select all
/forum-name-f12/                    Forum ID 12
/forum-name/my-topic-f12-t4-p5/        Topic id 4


....... Whatever. I think that the ids should be very close, like there.
I like them to be the first tho.
Previous username: TheKiller
Avatar on Memberlist 1.0.3
User avatar
Dragosvr92
Registered User
 
Posts: 358
Joined: Tue May 31, 2011 12:08 pm
Location: Romania

Re: [RFC] Human Readable URLs

Postby ecwpa » Tue Aug 16, 2011 9:58 pm

I agree, although, it would be inconsistent.

My take:

Code: Select all
/F12/forum_name/                    Forum ID 12
/F12/T4/P5/forum_name/my_topic/        Topic id 4


I do care about SEO but I also enjoy nice URLs, thats why I would like to use underscores, it's easier to read to the user. I know people use underscore for functions but I don't manage any programming forum. All IDs to the left, the rest is optional.
Sorry for my bad english, still improving.
ecwpa
Registered User
 
Posts: 169
Joined: Mon Jan 24, 2005 2:10 am

I am changing the post title

Postby sooskriszta » Fri Aug 26, 2011 4:00 pm

I recommend this schema
Code: Select all
home - domain.com/forums/
board - domain.com/forums/forum-title
topic - domain.com/forums/topic/topic-title
topic - domain.com/forums/topic/topic-title/page
post - domain.com/forums/post/post-title
SERP - domain.com/forums/search/keywords


Examples
Code: Select all
phpbb.com/community/
phpbb.com/community/3-2-arsia-rfcs-patches
phpbb.com/community/topic/rfc-human-readable-urls
phpbb.com/community/topic/rfc-human-readable-urls/2
phpbb.com/community/post/i-am-changing-the-post-title
phpbb.com/community/search/seo-urls


While Sam's original suggestion is very good, the above model solves 2 problems:
  • URLs don't break/change when a post is moved from one forum/board to another.
  • URLs are shorter, and thereby (presumptively) more user friendly.
  • These become, in effect, canonical URLs

I can almost hear a suggestion that the ID be included in the URL, but I believe that would reduce some of the value of the human readable URL...in that while the URL will be intelligible, it would not be much more replicable than current ugly URLs. That being said, while I prefer keeping IDs out of the URL, if I were told that the only way to get SEO URLs was to have the ID included in them, I would take it.

Don't blast me for taking a cue from Wordpress
http://codex.wordpress.org/Using_Permalinks
Image

but what I suggest is that when encountered with duplicate URLs, phpbb add a counter to the end on the title-based URL, e.g.
Code: Select all
phpbb.com/community/topic/rfc-human-readable-urls-2


This has something of a similar negative effect as including the ID in the URL, without any of the performance positives. So why do I prefer this over IDs?
  • In using IDs, ALL URLs will have IDs. In using counters for duplicates, fewer URLs will have these numbers
  • There is an unexpected benefit of counters - duplicate content becomes somewhat easier to spot (if same words are used in title)...it doesn't help with intentional abuse, but if as a user I am creating a topic [RFC] Human Readable URLs and I see that the URL created is
    Code: Select all
    phpbb.com/community/topic/rfc-human-readable-urls-2
    , then I am more likely to check out
    Code: Select all
    phpbb.com/community/topic/rfc-human-readable-urls
    even if I am prone to not searching before posting. It is even truer of people reading the post...if I were to come across a post
    Code: Select all
    phpbb.com/community/topic/rfc-human-readable-urls-2
    then I would likely check out
    Code: Select all
    phpbb.com/community/topic/rfc-human-readable-urls
    to see if the discussions are different...and so duplicate topics would, many times, be organically discarded....

Admittedly, this applies to topics, not posts and there would be a HUGE number of posts with similar URLs....but I think that is not a problem because the URLs that we are most concerned with are usually topic URLs; post URLs are RELATIVELY rarely shared/distributed/clicked directly.

When a topic is split, both one should continue to have the same URL, while the other should have the additional incremental counter.
Splitting
Code: Select all
phpbb.com/community/topic/rfc-human-readable-urls
should create
Code: Select all
phpbb.com/community/topic/rfc-human-readable-urls
phpbb.com/community/topic/rfc-human-readable-urls-2
phpbb.com/community/topic/rfc-human-readable-urls-3

etc

When 2 topics are merged, the admin should get an option to choose which URL shall be used in linking to the new topic, but both URLs should point to the new topic, i.e.
If admin merges
Code: Select all
phpbb.com/community/topic/rfc-human-readable-urls
and
Code: Select all
phpbb.com/community/topic/rfc-seo-urls

and says the url for merged topic should be
Code: Select all
phpbb.com/community/topic/rfc-human-readable-urls

then all parts of the site linking to the topic shall use the url
Code: Select all
phpbb.com/community/topic/rfc-human-readable-urls

but if someone types (or clicks in an old email)
Code: Select all
phpbb.com/community/topic/rfc-seo-urls
then that should go to the topic as well

If the title of the topic is changed, the url should remain the same as it was before changing the title. If the title of a topic is changed by admin, then the admin should have the option to manually rewrite the title part of the url. If the url thus entered by admin already exists (is duplicate) then the above defined method for duplicate urls applies, and a counter is quietly added by the system to the url.
If admin changes name of a topic from
[RFC] Pretty URLs
(which has URL
Code: Select all
phpbb.com/community/topic/rfc-pretty-urls
)
to
[RFC] Human Readable URLs
then URL of the post remains
Code: Select all
phpbb.com/community/topic/rfc-pretty-urls

but admin has the option to manually edit the rfc-pretty-urls part
If admin changes rfc-pretty-urls to rfc-human-readable-urls
Code: Select all
phpbb.com/community/topic/rfc-human-readable-urls

and
Code: Select all
phpbb.com/community/topic/rfc-human-readable-urls
already exists then automatically the URL should change from
Code: Select all
phpbb.com/community/topic/rfc-pretty-urls

to
Code: Select all
phpbb.com/community/topic/rfc-human-readable-urls-2


Cleaning the URLs:
  • unicode characters should not be messed with. All major browsers understand them and are able to use them. It's good if there are Russian or Hungarian or German or Greek or Hindi characters in the URL (here I depart from Wordpress...which forces smaller set ASCII for latin script languages...so, for instance ä in title becomes a in URL...I don't agree with this...ä should remain ä)
  • <space>, _, &, /, \, |, *, +, =, (, ), {, }, [, ], <, >, !, ?,@, ", #, should each be converted to - (dash)
  • I don't see any burning need to downcase letters....this was a security concern that some had a couple of years ago (URLs can be spoofed as capital lower case L looks like uppercase I etc. and so can, theoretically, be used for phishing), but it's more relevant to domain names, rather than a bb software URLs
  • I haven't made up my mind about whether or not to remove common words like "the", "or", "a", "an", "and", "to", etc....on one hand removing these would increase keyword density, while on the other hand the list of words may need to be maintained separately for each language...
OC2PS
Testfestés, Arcfestés, Csillámfestés

Alapanyagok, Képzések, Ismertetők
Hennafestés
GMAT coaching and MBA Admissions Consulting
formerly known as sooskriszta
User avatar
sooskriszta
Registered User
 
Posts: 85
Joined: Wed Dec 29, 2010 7:23 pm

Previous Next

Return to [3.x] RFCs

Who is online

Users browsing this forum: callumacrae and 17 guests