phpBB

Development Discussion Board

phpBB's testing ground of bleeding edge code
Advanced search

[RFC] Human Readable URLs

Publish your own request for comments or patches for the next version of phpBB. Discuss the contributions and proposals of others. Upcoming releases are 3.1/Ascraeus and 3.2/Arsia.

[RFC] Human Readable URLs

Postby Sam » Sat Jul 02, 2011 8:34 pm

Please note this is a Request For Comments topic, not a discussion. It is not meant to serve as a discussion for whether Pretty URLs are useful or not, rather to give my suggested implementation and receive comments and suggestions on this specific implementation.

Related Discussions:
http://area51.phpbb.com/phpBB/viewtopic.php?f=105&t=35616
http://www.phpbb.com/community/viewtopic.php?f=64&t=2100309

A URL is the gateway to your website, and how web pages can be accessed. The common movement on the web is to more towards a RESTful way to interface with web applications.

Benefits
  • User can be clued into where they are going by just looking at the URL
  • Keywords in the URL
  • Could pave the way for more seemless integration in APIs introduced later on
Drawbacks
  • Potential release of sensitive information via referrers
  • Rewriting URLs can cause additional load on the webserver

The implementation sample below is what I have worked out with the following in mind:
  • Avoid all URL collisions without looking up topics
  • Allow for old URLs (?f=12&t=34) to work and be redirected
  • Accommodate changing topic names
  • Language neutrality

Code: Select all
/                                   Index
/forum-name-f12/                    Forum ID 12
/forum-name-f12/my-topic-t4/        Topic id 4
/forum-name-f12/post/               New topic in forum id 12'
/forum-name-f12/my-topic-t4/reply/  Reply in topic 4
/memberlist/                        Memberlist
/memberlist/leaders/                Leaders
/member/sam-m2/                     Member ID 2
/group/group-name-g4/               Group 4
/ucp/{i}/                           Default GET params in the URL
/ucp/{i}/{mode}/                   
/mcp/{i}/                           Default GET params in the URL
/mcp/{i}/{mode}/
/search/                            Just use REQUEST for the rest except for special case
/search/egosearch/                  Ego search


The idea here would be to covertly stuff the ID (and some identifier to show what sort of ID it is) into the URL. This ensures that collisions are impossible and will not require lookups to check for similar url slugs.

When you submit a topic, it will clean the title out and tack on "-t{ID}" to the end to produce the final slug. This is stored in the database. When you visit a page, it will go only by the topic ID, where the rest of the text is simply dummy text. The page will check if the the URL slug is correct and redirect if it does not match the one stored in the database. This will allow topic title changes to happen seamlessly, and the 301 will tell search engines to update their links.

An example of a redirect:
  1. User clicks an old url:
    http://www.phpbb.com/community/viewtopic.php?f=14&t=2133523
  2. htaccess (which is not aware of anything DB side) will redirect here:
    http://www.phpbb.com/community/f14/t2133523/
  3. User lands on the page, which detects the slugs do not match. It will then direct them here:
    http://www.phpbb.com/community/announcements-f14/phpbb-at-oscon-july-26-28-t2133523/

Comments/Suggestions?
User avatar
Sam
Website Team
Website Team
 
Posts: 31
Joined: Fri Jan 23, 2009 10:24 pm

Re: [RFC] Pretty URLs

Postby Meis2M » Sun Jul 03, 2011 3:39 am

wow...its a very good idea. I congratulate Sam
User avatar
Meis2M
Registered User
 
Posts: 213
Joined: Fri Apr 23, 2010 10:18 am

Re: [RFC] Pretty URLs

Postby naderman » Sun Jul 03, 2011 4:57 am

Can you explain more precisely what the cleaning of the topic title would work like? What are your thoughts on handling Unicode? Are the compatability problems with webservers and unicode in paths?

Should it be u<i> or m<i> for user/member? We have a member list, but typically refer to users.
www.naderman.de
Move your forum to Forumatic - we'll take care of maintenance & spam
User avatar
naderman
Development Team Leader
Development Team Leader
 
Posts: 1650
Joined: Sun Jan 11, 2004 2:11 am
Location: Karlsruhe, Germany

Re: [RFC] Pretty URLs

Postby Sam » Sun Jul 03, 2011 5:39 am

naderman wrote:Can you explain more precisely what the cleaning of the topic title would work like? What are your thoughts on handling Unicode? Are the compatability problems with webservers and unicode in paths?

Should it be u<i> or m<i> for user/member? We have a member list, but typically refer to users.

Cleaning the topic title would basically strip all odd characters, punctuation, replace whitespace with a single hyphen ( - ), and possibly run a UTF8 strtolower() on the slug as well. Should give us a nice pretty URL. Unicode should and can be preserved, though we may have tweak the text parsing engine a little because of the issue below. I have not checked specifically if anything other than apache supports unicode paths, but here is an example, yet simple implementation on my test server:

http://temp.websyntax.net/test/общая-дискуссия-f34/
http://temp.websyntax.net/test/общая-дискуссия-f34/привет-мир-t242/

Currently, It does look that phpBB doesn't seem interested in parsing these as URLs.

I chose "m" for member simply because it is accessed via "memberlist.php". "u" would not be used for anything otherwise, so it could easily be changed if it makes more sense to continue to refer to members as users.
User avatar
Sam
Website Team
Website Team
 
Posts: 31
Joined: Fri Jan 23, 2009 10:24 pm

Re: [RFC] Human Readable URLs

Postby naderman » Sun Jul 03, 2011 5:49 am

Maybe we should simply keep original titles rather that lowercase them (or if anything case fold them? not necessary, but not sure if the strtolower results are preferable or not).

Support for Punycode as well as unicode in URLs needs to be added either way.
www.naderman.de
Move your forum to Forumatic - we'll take care of maintenance & spam
User avatar
naderman
Development Team Leader
Development Team Leader
 
Posts: 1650
Joined: Sun Jan 11, 2004 2:11 am
Location: Karlsruhe, Germany

Re: [RFC] Human Readable URLs

Postby Sam » Sun Jul 03, 2011 6:00 am

The lower can very easily be made a configuration option, assuming a UTF8 implementation of strtolower could be produced.
User avatar
Sam
Website Team
Website Team
 
Posts: 31
Joined: Fri Jan 23, 2009 10:24 pm

Re: [RFC] Human Readable URLs

Postby bantu » Sun Jul 03, 2011 11:39 pm

Such a feature would have to be optional and disabled by default. For one reason because of the listed drawbacks (especially leaking information), but It probably also requires the webserver to coorporate.
User avatar
bantu
3.0 Release Manager
3.0 Release Manager
 
Posts: 439
Joined: Thu Sep 07, 2006 11:22 am
Location: Karlsruhe, Germany

Re: [RFC] Human Readable URLs

Postby Sam » Mon Jul 04, 2011 6:28 am

bantu wrote:Such a feature would have to be optional and disabled by default. For one reason because of the listed drawbacks (especially leaking information), but It probably also requires the webserver to coorporate.

A way around that is to have a click-through page (for all external links) that just acts an intermediary page between the page the link is on and the link's destination. It would cause the referrer to be something like domain.com/click.php?url=http://google.com.
User avatar
Sam
Website Team
Website Team
 
Posts: 31
Joined: Fri Jan 23, 2009 10:24 pm

Re: [RFC] Human Readable URLs

Postby Erik Frèrejean » Mon Jul 04, 2011 11:08 am

Sam wrote:
bantu wrote:Such a feature would have to be optional and disabled by default. For one reason because of the listed drawbacks (especially leaking information), but It probably also requires the webserver to coorporate.

A way around that is to have a click-through page (for all external links) that just acts an intermediary page between the page the link is on and the link's destination. It would cause the referrer to be something like domain.com/click.php?url=http://google.com.

That doesn't protect from directly posting the links. On one of my boards the team has an public and a private forum, team members will post links to the private section in the public forum.

I'd as well like to see this made optionally but with all three formats available
  • Code: Select all
    /forum-name-f12/                    Forum ID 12
  • Code: Select all
    /f12/                               Forum ID 12
  • The current setup
The middle one provides cleaner URLs than we currently have without the slug lookups and potentially leaking of information.
Available on .com
Support Toolkit developer
User avatar
Erik Frèrejean
Registered User
 
Posts: 207
Joined: Thu Oct 25, 2007 2:25 pm
Location: surfnet

Re: [RFC] Human Readable URLs

Postby bantu » Mon Jul 04, 2011 12:53 pm

Sam wrote:
bantu wrote:Such a feature would have to be optional and disabled by default. For one reason because of the listed drawbacks (especially leaking information), but It probably also requires the webserver to coorporate.

A way around that is to have a click-through page (for all external links) that just acts an intermediary page between the page the link is on and the link's destination. It would cause the referrer to be something like domain.com/click.php?url=http://google.com.

But then you can no longer copy links directly, which is a trivial thing to do right now.
User avatar
bantu
3.0 Release Manager
3.0 Release Manager
 
Posts: 439
Joined: Thu Sep 07, 2006 11:22 am
Location: Karlsruhe, Germany

Next

Return to [3.x] RFCs

Who is online

Users browsing this forum: brunoais, Yacy [Bot] and 11 guests