I recommend this schema
Code: Select all
home - domain.com/forums/
board - domain.com/forums/forum-title
topic - domain.com/forums/topic/topic-title
topic - domain.com/forums/topic/topic-title/page
post - domain.com/forums/post/post-title
SERP - domain.com/forums/search/keywords
Examples
Code: Select all
phpbb.com/community/
phpbb.com/community/3-2-arsia-rfcs-patches
phpbb.com/community/topic/rfc-human-readable-urls
phpbb.com/community/topic/rfc-human-readable-urls/2
phpbb.com/community/post/i-am-changing-the-post-title
phpbb.com/community/search/seo-urls
While Sam's original suggestion is very good, the above model solves 2 problems:
- URLs don't break/change when a post is moved from one forum/board to another.
- URLs are shorter, and thereby (presumptively) more user friendly.
- These become, in effect, canonical URLs
I can almost hear a suggestion that the ID be included in the URL, but I believe that would reduce some of the value of the human readable URL...in that while the URL will be intelligible, it would not be much more replicable than current ugly URLs. That being said, while I prefer keeping IDs out of the URL, if I were told that the only way to get SEO URLs was to have the ID included in them, I would take it.
Don't blast me for taking a cue from Wordpress
http://codex.wordpress.org/Using_Permalinks
but what I suggest is that when encountered with duplicate URLs, phpbb add a counter to the end on the title-based URL, e.g.
Code: Select all
phpbb.com/community/topic/rfc-human-readable-urls-2
This has something of a similar negative effect as including the ID in the URL, without any of the performance positives. So why do I prefer this over IDs?
- In using IDs, ALL URLs will have IDs. In using counters for duplicates, fewer URLs will have these numbers
- There is an unexpected benefit of counters - duplicate content becomes somewhat easier to spot (if same words are used in title)...it doesn't help with intentional abuse, but if as a user I am creating a topic [RFC] Human Readable URLs and I see that the URL created is
Code: Select all
phpbb.com/community/topic/rfc-human-readable-urls-2
, then I am more likely to check out Code: Select all
phpbb.com/community/topic/rfc-human-readable-urls
even if I am prone to not searching before posting. It is even truer of people reading the post...if I were to come across a post Code: Select all
phpbb.com/community/topic/rfc-human-readable-urls-2
then I would likely check out Code: Select all
phpbb.com/community/topic/rfc-human-readable-urls
to see if the discussions are different...and so duplicate topics would, many times, be organically discarded....
Admittedly, this applies to topics, not posts and there would be a HUGE number of posts with similar URLs....but I think that is not a problem because the URLs that we are most concerned with are usually topic URLs; post URLs are RELATIVELY rarely shared/distributed/clicked directly.
When a topic is split, both one should continue to have the same URL, while the other should have the additional incremental counter.
Splitting
Code: Select all
phpbb.com/community/topic/rfc-human-readable-urls
should create
Code: Select all
phpbb.com/community/topic/rfc-human-readable-urls
phpbb.com/community/topic/rfc-human-readable-urls-2
phpbb.com/community/topic/rfc-human-readable-urls-3
etc
When 2 topics are merged, the admin should get an option to choose which URL shall be used in linking to the new topic, but both URLs should point to the new topic, i.e.
If admin merges
Code: Select all
phpbb.com/community/topic/rfc-human-readable-urls
and
Code: Select all
phpbb.com/community/topic/rfc-seo-urls
and says the url for merged topic should be
Code: Select all
phpbb.com/community/topic/rfc-human-readable-urls
then all parts of the site linking to the topic shall use the url
Code: Select all
phpbb.com/community/topic/rfc-human-readable-urls
but if someone types (or clicks in an old email)
Code: Select all
phpbb.com/community/topic/rfc-seo-urls
then that should go to the topic as well
If the title of the topic is changed, the url should remain the same as it was before changing the title. If the title of a topic is changed by admin, then the admin should have the option to manually rewrite the title part of the url. If the url thus entered by admin already exists (is duplicate) then the above defined method for duplicate urls applies, and a counter is quietly added by the system to the url.
If admin changes name of a topic from
[RFC] Pretty URLs
(which has URL
Code: Select all
phpbb.com/community/topic/rfc-pretty-urls
)
to
[RFC] Human Readable URLs
then URL of the post remains
Code: Select all
phpbb.com/community/topic/rfc-pretty-urls
but admin has the option to manually edit the rfc-pretty-urls part
If admin changes rfc-pretty-urls to rfc-human-readable-urls
Code: Select all
phpbb.com/community/topic/rfc-human-readable-urls
and
Code: Select all
phpbb.com/community/topic/rfc-human-readable-urls
already exists then automatically the URL should change from
Code: Select all
phpbb.com/community/topic/rfc-pretty-urls
to
Code: Select all
phpbb.com/community/topic/rfc-human-readable-urls-2
Cleaning the URLs:
- unicode characters should not be messed with. All major browsers understand them and are able to use them. It's good if there are Russian or Hungarian or German or Greek or Hindi characters in the URL (here I depart from Wordpress...which forces smaller set ASCII for latin script languages...so, for instance ä in title becomes a in URL...I don't agree with this...ä should remain ä)
- <space>, _, &, /, \, |, *, +, =, (, ), {, }, [, ], <, >, !, ?,@, ", #, should each be converted to - (dash)
- I don't see any burning need to downcase letters....this was a security concern that some had a couple of years ago (URLs can be spoofed as capital lower case L looks like uppercase I etc. and so can, theoretically, be used for phishing), but it's more relevant to domain names, rather than a bb software URLs
- I haven't made up my mind about whether or not to remove common words like "the", "or", "a", "an", "and", "to", etc....on one hand removing these would increase keyword density, while on the other hand the list of words may need to be maintained separately for each language...