It wants to be a separate script from feed.php - the similarities are there at a high level, but the protocols are sufficiently different that there's not really anything to be gained if one script did both, but a lot of debugging headaches. So, here's my thoughts for a complete /sitemap.php, complete with relatively untested ACP component. I'm fairly confident that the basic generator works well and should scale up to fairly large sites. It splits the output over many individual sitemaps, complete with a sitemap index, so that only 1 URL needs to be handed to the crawler. For small sites, it's possible to generate just a single topics sitemap.
Some tuning of the _limit paramters will be needed on large sites to find the optimal level which stays below the 50,000 URLs and 10MB limits. This is not really predictable, as the number of URLs per forum/topic varies with the pagination configured on the board. The script generates the full series of &start=n URLs. The aim is to reliably produce a full dump of all the canonical forum & topic URLs on the board, with high accuract on the <lastmod> tags to allow the bots to rapidly extract the new content while having a fully comprehensive map of the board.
One constant to add in includes/constants.php:
- Code: Select all
define('FORUM_OPTION_SITEMAP_EXCLUDE', 10);
A reasonable set of defaults:
- Code: Select all
COPY phpbb_config (config_name, config_value, is_dynamic) FROM stdin;
sitemap_cache_time 3600 0
sitemap_enable 1 0
sitemap_feeds 0 0
sitemap_forum 1 0
sitemap_limit 10000 0
sitemap_overall_forums 1 0
sitemap_overall_forums_limit 10000 0
sitemap_overall_topics 0 0
sitemap_overall_topics_limit 10000 0
sitemap_sort_days 0 0
\.
Once it's dropped in place (ACP can be skipped, if desired - minimum is /sitemap.php, the constant, and the SQL), you've got the following URLs:
- /sitemap.php - Sitemap index - give this to search engines for full auto mode
- /sitemap.php?mode=index - alias for /sitemap.php
- /sitemap.php?mode=forums - List of /index.php and all the /viewforum.php?f=n URLs
- /sitemap.php?mode=topics - List of all the /viewtopic.php?t=n URLs
- /sitemap.php?f=n - List of all the /viewtopic.php?f=n&t=m URLs
One cache file is generated per URL, so there's a maximum of 3+n cache files, where n is the number of forums.
I'm posting this here, rather than the mods development forum, as I think it's the sort of thing which deserves to live alongside /feed.php in the main distribution.

