[RFC] Human Readable URLs

Note: We are moving the topics of this forum and it will be deleted at some point

Publish your own request for comments/change or patches for the next version of phpBB. Discuss the contributions and proposals of others. Upcoming releases are 3.2/Rhea and 3.3.
Locked
ecwpa
Registered User
Posts: 181
Joined: Mon Jan 24, 2005 2:10 am
Contact:

Re: [RFC] Human Readable URLs

Post by ecwpa » Fri Aug 26, 2011 9:40 pm

Read it all, very interesting points.

Now, the reason a lot of people (probably developers) are in favor of keeping IDs is because you don't have to worry about duplicates, it's just impossible, everything is unique. If you don't use them then its get kinda complicated, lots of extra fields and probably extra table(s).

Also, its a forum, not a blog. Bloggers control the SEO urls, forum admins can't control users from changing topic and post titles, wich means you may need a log of previews names to avoid broken urls. That bothers me.

What about bumping posts? If the date is part of the ID, will it break the url?

If we choose to keep using IDs it will be the one and only way to access to topics/post, and I believe that's the fastest too. This method require a tiny amount of extra coding, it wont affect previews post and topics before its implemented and it requires no database changes.

Taking the consideration that topic and posts IDs are basically the core of forum, in the end its all about keeping the code as clean as possible, vs making phpbb more versatile but making the code way more complicated.
Slightly better English than it was in 2005, still improving :D

User avatar
sooskriszta
Registered User
Posts: 85
Joined: Wed Dec 29, 2010 7:23 pm

Re: [RFC] Human Readable URLs

Post by sooskriszta » Sat Aug 27, 2011 5:24 pm

In my ideal world, the URL won't have the ID. But if the developers say that if we include the ID in the URL then this can be delivered much faster than if we don't then I'd say go for it, even though it would take away the advantages or memorability (remembering the URL) and (consequently) replicability (typing in URLs). With IDs, my suggested URLs could look like this:

Code: Select all

home - domain.com/forums/
board - domain.com/forums/forum-title/id/
topic - domain.com/forums/topic/topic-title/id/
topic - domain.com/forums/topic/topic-title/page/id/
post - domain.com/forums/post/post-title/id/
SERP - domain.com/forums/search/keywords

[b]Examples[/b]
phpbb.com/community/
phpbb.com/community/3-2-arsia-rfcs-patches/108/
phpbb.com/community/topic/rfc-human-readable-urls/40965/
phpbb.com/community/topic/rfc-human-readable-urls/2/40965/
phpbb.com/community/post/i-am-changing-the-post-title/229197/
phpbb.com/community/search/seo-urls
or like this

Code: Select all

home - domain.com/forums/
board - domain.com/forums/id/forum-title
topic - domain.com/forums/topic/id/topic-title
topic - domain.com/forums/topic/id/topic-title/page
post - domain.com/forums/post/id/post-title
SERP - domain.com/forums/search/keywords

[b]Examples[/b]
phpbb.com/community/
phpbb.com/community/108/3-2-arsia-rfcs-patches
phpbb.com/community/topic/40965/rfc-human-readable-urls
phpbb.com/community/topic/40965/rfc-human-readable-urls/2
phpbb.com/community/post/229197/i-am-changing-the-post-title
phpbb.com/community/search/seo-urls
Personally, I'd have a SLIGHT preference for the first set, though at that point it's probably best to let the developers decide.
OC2PS
Testfestés, Arcfestés, Csillámfestés

Alapanyagok, Képzések, Ismertetők
Hennafestés
GMAT coaching and MBA Admissions Consulting
formerly known as sooskriszta

User avatar
callumacrae
Infrastructure Team
Infrastructure Team
Posts: 1046
Joined: Tue Apr 27, 2010 9:37 am
Location: England
Contact:

Re: [RFC] Human Readable URLs

Post by callumacrae » Wed Aug 31, 2011 7:02 am

What about different languages? That could get extremely messy.
Made by developers, for developers!
My blog

User avatar
AmigoJack
Registered User
Posts: 92
Joined: Wed May 04, 2011 7:47 pm
Location: グリーン ヒル ゾーン
Contact:

Re: I am changing the post title

Post by AmigoJack » Wed Aug 31, 2011 8:05 am

sooskriszta wrote:
  • There is an unexpected benefit of counters - duplicate content becomes somewhat easier to spot (if same words are used in title)...it doesn't help with intentional abuse, but if as a user I am creating a topic [RFC] Human Readable URLs and I see that the URL created is

    Code: Select all

    phpbb.com/community/topic/rfc-human-readable-urls-2
    , then I am more likely to check out

    Code: Select all

    phpbb.com/community/topic/rfc-human-readable-urls
    even if I am prone to not searching before posting. It is even truer of people reading the post...if I were to come across a post

    Code: Select all

    phpbb.com/community/topic/rfc-human-readable-urls-2
    then I would likely check out

    Code: Select all

    phpbb.com/community/topic/rfc-human-readable-urls
    to see if the discussions are different...and so duplicate topics would, many times, be organically discarded....
Yeah, you. But not users who aren't aware of the address textbox in their browsers at all. Think of browser developers already announcing to drop the display of the address textbox at all. Not to mention cricital cases where it's not that obvious. Or cases which are from the real world, like in a support forum where every 10th topic title just goes "Help", so your counter would go up to take 2 or 3 digits.
sooskriszta wrote:If the title of the topic is changed, the url should remain the same as it was before changing the title
Think of cases where one has created a topic and later recognizes a big spelling mistake. Or just wants to add "[closed]" in front of it. In the first case the wrong topic title would remain forever - even if it was modified by a moderator (who turned a great title like "Help" into "issues with editor", because he's no administrator).
sooskriszta wrote:but if someone types (or clicks in an old email)

Code: Select all

phpbb.com/community/topic/rfc-seo-urls
then that should go to the topic as well
So internally all human-readable-URLs for a topics ever be used or created must be stored. Needless to say that this results in a pure text comparison for the database each time a URL is requested, which impacts performance alot!
sooskriszta wrote:
  • <space>, _, &, /, \, |, *, +, =, (, ), {, }, [, ], <, >, !, ?,@, ", #, should each be converted to - (dash)
While this can totally ruin a topic title like "[new] added /etc/hosts & updated C++ sources" you also end up having lots of repeated hyphens. As long as pairs appear in the target URL they should be reduced to one single hyphen, also killing leading and trailing hyphens, so -new--added--etc-hosts---updated-C---sources becomes new-added-etc-hosts-updated-C-sources.
sooskriszta wrote:
  • I haven't made up my mind about whether or not to remove common words like "the", "or", "a", "an", "and", "to", etc....on one hand removing these would increase keyword density, while on the other hand the list of words may need to be maintained separately for each language...
I wouldn't. If the topic title is "The Cell" (name of a movie) the stripped first word would make the topic because of the other single word unfindable.

callumacrae wrote:What about different languages? That could get extremely messy.
Do you mean because URLs then become IDNs? Or do you mean because we now used hierarchies like /forums/topic/ which are still forced to remain english words, rather than localized ones (/フォーラム/トピック/)?

User avatar
sooskriszta
Registered User
Posts: 85
Joined: Wed Dec 29, 2010 7:23 pm

Re: [RFC] Human Readable URLs

Post by sooskriszta » Wed Aug 31, 2011 8:59 am

callumacrae wrote:What about different languages? That could get extremely messy.
Messy how?
OC2PS
Testfestés, Arcfestés, Csillámfestés

Alapanyagok, Képzések, Ismertetők
Hennafestés
GMAT coaching and MBA Admissions Consulting
formerly known as sooskriszta

User avatar
callumacrae
Infrastructure Team
Infrastructure Team
Posts: 1046
Joined: Tue Apr 27, 2010 9:37 am
Location: England
Contact:

Re: [RFC] Human Readable URLs

Post by callumacrae » Wed Aug 31, 2011 9:07 am

sooskriszta wrote:
callumacrae wrote:What about different languages? That could get extremely messy.
Messy how?
Do you have the URLs (/topic/) in the board primary language? English? The users language? What happens if the language changes?
Made by developers, for developers!
My blog

User avatar
sooskriszta
Registered User
Posts: 85
Joined: Wed Dec 29, 2010 7:23 pm

Re: I am changing the post title

Post by sooskriszta » Wed Aug 31, 2011 9:09 am

AmigoJack wrote:
sooskriszta wrote:
  • There is an unexpected benefit of counters - duplicate content becomes somewhat easier to spot (if same words are used in title)...it doesn't help with intentional abuse, but if as a user I am creating a topic [RFC] Human Readable URLs and I see that the URL created is

    Code: Select all

    phpbb.com/community/topic/rfc-human-readable-urls-2
    , then I am more likely to check out

    Code: Select all

    phpbb.com/community/topic/rfc-human-readable-urls
    even if I am prone to not searching before posting. It is even truer of people reading the post...if I were to come across a post

    Code: Select all

    phpbb.com/community/topic/rfc-human-readable-urls-2
    then I would likely check out

    Code: Select all

    phpbb.com/community/topic/rfc-human-readable-urls
    to see if the discussions are different...and so duplicate topics would, many times, be organically discarded....
Yeah, you. But not users who aren't aware of the address textbox in their browsers at all. Think of browser developers already announcing to drop the display of the address textbox at all. Not to mention cricital cases where it's not that obvious. Or cases which are from the real world, like in a support forum where every 10th topic title just goes "Help", so your counter would go up to take 2 or 3 digits.
3 things:
1) I think you are jumping a bit far too much into unrealistic territory...a bit like people who say that due to url shortening services seo urls are moot.
2) Yes, the solution mimics the real world....it does not propose to solve the problem of bad titles....the principle of garbage in garbage out still applies....in the real world, increasing the URL readability of topics titled just "Help" is pointless as is trying to improve their Search Engine appeal. And you are right - there's a lot of those out there. But the way I look at it, the proposed solution does not worsen the situation in any way in this case, while it improves the situation quite a bit in other cases with meaningful titles.
3) As I said, this is an "unexpected benefit" not the raison d'être.
AmigoJack wrote:
sooskriszta wrote:If the title of the topic is changed, the url should remain the same as it was before changing the title
Think of cases where one has created a topic and later recognizes a big spelling mistake. Or just wants to add "[closed]" in front of it. In the first case the wrong topic title would remain forever - even if it was modified by a moderator (who turned a great title like "Help" into "issues with editor", because he's no administrator).
1. For normal users: I think that is acceptable. Compare this to an old topic where someone just wants to add closed. If the URL is changed to accommodate this, then suddenly all distributions of the URL (including search results) would break. That is a situation more undesirable than allowing every tom, dick and harry to fix spelling mistakes in URLs.
2. For moderators: I think I should have said moderators instead of administrators. What I really meant was the management team of the board should have the right to manually edit the last part of the URL. Point is the tradeoff of "accurate & current URL" vs "braking links" should be managed by site owners.
AmigoJack wrote:
sooskriszta wrote:
sooskriszta wrote:
  • <space>, _, &, /, \, |, *, +, =, (, ), {, }, [, ], <, >, !, ?,@, ", #, should each be converted to - (dash)
While this can totally ruin a topic title like "[new] added /etc/hosts & updated C++ sources" you also end up having lots of repeated hyphens. As long as pairs appear in the target URL they should be reduced to one single hyphen, also killing leading and trailing hyphens, so -new--added--etc-hosts---updated-C---sources becomes new-added-etc-hosts-updated-C-sources.
Good point. I agree. Multiple adjacent - should be converted to a single -
AmigoJack wrote:
sooskriszta wrote:
  • I haven't made up my mind about whether or not to remove common words like "the", "or", "a", "an", "and", "to", etc....on one hand removing these would increase keyword density, while on the other hand the list of words may need to be maintained separately for each language...
I wouldn't. If the topic title is "The Cell" (name of a movie) the stripped first word would make the topic because of the other single word unfindable.
I kindof agree. But still undecided.
OC2PS
Testfestés, Arcfestés, Csillámfestés

Alapanyagok, Képzések, Ismertetők
Hennafestés
GMAT coaching and MBA Admissions Consulting
formerly known as sooskriszta

User avatar
sooskriszta
Registered User
Posts: 85
Joined: Wed Dec 29, 2010 7:23 pm

Re: [RFC] Human Readable URLs

Post by sooskriszta » Wed Aug 31, 2011 9:11 am

callumacrae wrote:
sooskriszta wrote:
callumacrae wrote:What about different languages? That could get extremely messy.
Messy how?
Do you have the URLs (/topic/) in the board primary language? English? The users language? What happens if the language changes?
I still don't understand your point. What is the problem with a URL like

Code: Select all

http://www.domain.ru/forum/Перейти-от-радости
or

Code: Select all

http://www.domain.cn/forum/跳躍的喜悅
OC2PS
Testfestés, Arcfestés, Csillámfestés

Alapanyagok, Képzések, Ismertetők
Hennafestés
GMAT coaching and MBA Admissions Consulting
formerly known as sooskriszta

User avatar
callumacrae
Infrastructure Team
Infrastructure Team
Posts: 1046
Joined: Tue Apr 27, 2010 9:37 am
Location: England
Contact:

Re: [RFC] Human Readable URLs

Post by callumacrae » Wed Aug 31, 2011 11:40 am

sooskriszta wrote:
callumacrae wrote:
sooskriszta wrote:
callumacrae wrote:What about different languages? That could get extremely messy.
Messy how?
Do you have the URLs (/topic/) in the board primary language? English? The users language? What happens if the language changes?
I still don't understand your point. What is the problem with a URL like

Code: Select all

http://www.domain.ru/forum/Перейти-от-радости
or

Code: Select all

http://www.domain.cn/forum/跳躍的喜悅
forum is an English word. It would look weird on French forums.
Made by developers, for developers!
My blog

User avatar
sooskriszta
Registered User
Posts: 85
Joined: Wed Dec 29, 2010 7:23 pm

Re: [RFC] Human Readable URLs

Post by sooskriszta » Wed Aug 31, 2011 11:43 am

How does it work at the moment? Are the filenames viewforum.php, viewtopic.php, posting.php, etc translated?
OC2PS
Testfestés, Arcfestés, Csillámfestés

Alapanyagok, Képzések, Ismertetők
Hennafestés
GMAT coaching and MBA Admissions Consulting
formerly known as sooskriszta

Locked