[RFC] Human Readable URLs

Note: We are moving the topics of this forum and it will be deleted at some point

Publish your own request for comments/change or patches for the next version of phpBB. Discuss the contributions and proposals of others. Upcoming releases are 3.2/Rhea and 3.3.
Locked
User avatar
Sam
Registered User
Posts: 31
Joined: Fri Jan 23, 2009 10:24 pm

[RFC] Human Readable URLs

Post by Sam » Sat Jul 02, 2011 8:34 pm

Please note this is a Request For Comments topic, not a discussion. It is not meant to serve as a discussion for whether Pretty URLs are useful or not, rather to give my suggested implementation and receive comments and suggestions on this specific implementation.

Related Discussions:
viewtopic.php?f=105&t=35616
http://www.phpbb.com/community/viewtopi ... &t=2100309

A URL is the gateway to your website, and how web pages can be accessed. The common movement on the web is to more towards a RESTful way to interface with web applications.

Benefits
  • User can be clued into where they are going by just looking at the URL
  • Keywords in the URL
  • Could pave the way for more seemless integration in APIs introduced later on
Drawbacks
  • Potential release of sensitive information via referrers
  • Rewriting URLs can cause additional load on the webserver
The implementation sample below is what I have worked out with the following in mind:
  • Avoid all URL collisions without looking up topics
  • Allow for old URLs (?f=12&t=34) to work and be redirected
  • Accommodate changing topic names
  • Language neutrality

Code: Select all

/                                   Index
/forum-name-f12/                    Forum ID 12
/forum-name-f12/my-topic-t4/        Topic id 4
/forum-name-f12/post/               New topic in forum id 12'
/forum-name-f12/my-topic-t4/reply/  Reply in topic 4
/memberlist/                        Memberlist
/memberlist/leaders/                Leaders
/member/sam-m2/                     Member ID 2
/group/group-name-g4/               Group 4
/ucp/{i}/                           Default GET params in the URL
/ucp/{i}/{mode}/                    
/mcp/{i}/                           Default GET params in the URL
/mcp/{i}/{mode}/
/search/                            Just use REQUEST for the rest except for special case
/search/egosearch/                  Ego search
The idea here would be to covertly stuff the ID (and some identifier to show what sort of ID it is) into the URL. This ensures that collisions are impossible and will not require lookups to check for similar url slugs.

When you submit a topic, it will clean the title out and tack on "-t{ID}" to the end to produce the final slug. This is stored in the database. When you visit a page, it will go only by the topic ID, where the rest of the text is simply dummy text. The page will check if the the URL slug is correct and redirect if it does not match the one stored in the database. This will allow topic title changes to happen seamlessly, and the 301 will tell search engines to update their links.

An example of a redirect:
  1. User clicks an old url:
    http://www.phpbb.com/community/viewtopic.php?f=14&t=2133523
  2. htaccess (which is not aware of anything DB side) will redirect here:
    http://www.phpbb.com/community/f14/t2133523/
  3. User lands on the page, which detects the slugs do not match. It will then direct them here:
    http://www.phpbb.com/community/announcements-f14/phpbb-at-oscon-july-26-28-t2133523/
Comments/Suggestions?

User avatar
Meis2M
Registered User
Posts: 411
Joined: Fri Apr 23, 2010 10:18 am
Contact:

Re: [RFC] Pretty URLs

Post by Meis2M » Sun Jul 03, 2011 3:39 am

wow...its a very good idea. I congratulate Sam

User avatar
naderman
Product Manager
Product Manager
Posts: 1727
Joined: Sun Jan 11, 2004 2:11 am
Location: Karlsruhe, Germany
Contact:

Re: [RFC] Pretty URLs

Post by naderman » Sun Jul 03, 2011 4:57 am

Can you explain more precisely what the cleaning of the topic title would work like? What are your thoughts on handling Unicode? Are the compatability problems with webservers and unicode in paths?

Should it be u<i> or m<i> for user/member? We have a member list, but typically refer to users.

User avatar
Sam
Registered User
Posts: 31
Joined: Fri Jan 23, 2009 10:24 pm

Re: [RFC] Pretty URLs

Post by Sam » Sun Jul 03, 2011 5:39 am

naderman wrote:Can you explain more precisely what the cleaning of the topic title would work like? What are your thoughts on handling Unicode? Are the compatability problems with webservers and unicode in paths?

Should it be u<i> or m<i> for user/member? We have a member list, but typically refer to users.
Cleaning the topic title would basically strip all odd characters, punctuation, replace whitespace with a single hyphen ( - ), and possibly run a UTF8 strtolower() on the slug as well. Should give us a nice pretty URL. Unicode should and can be preserved, though we may have tweak the text parsing engine a little because of the issue below. I have not checked specifically if anything other than apache supports unicode paths, but here is an example, yet simple implementation on my test server:

http://temp.websyntax.net/test/общая-дискуссия-f34/
http://temp.websyntax.net/test/общая-дискуссия-f34/привет-мир-t242/

Currently, It does look that phpBB doesn't seem interested in parsing these as URLs.

I chose "m" for member simply because it is accessed via "memberlist.php". "u" would not be used for anything otherwise, so it could easily be changed if it makes more sense to continue to refer to members as users.

User avatar
naderman
Product Manager
Product Manager
Posts: 1727
Joined: Sun Jan 11, 2004 2:11 am
Location: Karlsruhe, Germany
Contact:

Re: [RFC] Human Readable URLs

Post by naderman » Sun Jul 03, 2011 5:49 am

Maybe we should simply keep original titles rather that lowercase them (or if anything case fold them? not necessary, but not sure if the strtolower results are preferable or not).

Support for Punycode as well as unicode in URLs needs to be added either way.

User avatar
Sam
Registered User
Posts: 31
Joined: Fri Jan 23, 2009 10:24 pm

Re: [RFC] Human Readable URLs

Post by Sam » Sun Jul 03, 2011 6:00 am

The lower can very easily be made a configuration option, assuming a UTF8 implementation of strtolower could be produced.

User avatar
bantu
3.0 Release Manager
3.0 Release Manager
Posts: 557
Joined: Thu Sep 07, 2006 11:22 am
Location: Karlsruhe, Germany
Contact:

Re: [RFC] Human Readable URLs

Post by bantu » Sun Jul 03, 2011 11:39 pm

Such a feature would have to be optional and disabled by default. For one reason because of the listed drawbacks (especially leaking information), but It probably also requires the webserver to coorporate.

User avatar
Sam
Registered User
Posts: 31
Joined: Fri Jan 23, 2009 10:24 pm

Re: [RFC] Human Readable URLs

Post by Sam » Mon Jul 04, 2011 6:28 am

bantu wrote:Such a feature would have to be optional and disabled by default. For one reason because of the listed drawbacks (especially leaking information), but It probably also requires the webserver to coorporate.
A way around that is to have a click-through page (for all external links) that just acts an intermediary page between the page the link is on and the link's destination. It would cause the referrer to be something like domain.com/click.php?url=http://google.com.

User avatar
Erik Frèrejean Online
Registered User
Posts: 207
Joined: Thu Oct 25, 2007 2:25 pm
Location: surfnet
Contact:

Re: [RFC] Human Readable URLs

Post by Erik Frèrejean » Mon Jul 04, 2011 11:08 am

Sam wrote:
bantu wrote:Such a feature would have to be optional and disabled by default. For one reason because of the listed drawbacks (especially leaking information), but It probably also requires the webserver to coorporate.
A way around that is to have a click-through page (for all external links) that just acts an intermediary page between the page the link is on and the link's destination. It would cause the referrer to be something like domain.com/click.php?url=http://google.com.
That doesn't protect from directly posting the links. On one of my boards the team has an public and a private forum, team members will post links to the private section in the public forum.

I'd as well like to see this made optionally but with all three formats available The middle one provides cleaner URLs than we currently have without the slug lookups and potentially leaking of information.
Available on .com
Support Toolkit developer

User avatar
bantu
3.0 Release Manager
3.0 Release Manager
Posts: 557
Joined: Thu Sep 07, 2006 11:22 am
Location: Karlsruhe, Germany
Contact:

Re: [RFC] Human Readable URLs

Post by bantu » Mon Jul 04, 2011 12:53 pm

Sam wrote:
bantu wrote:Such a feature would have to be optional and disabled by default. For one reason because of the listed drawbacks (especially leaking information), but It probably also requires the webserver to coorporate.
A way around that is to have a click-through page (for all external links) that just acts an intermediary page between the page the link is on and the link's destination. It would cause the referrer to be something like domain.com/click.php?url=http://google.com.
But then you can no longer copy links directly, which is a trivial thing to do right now.

Locked