  • Bobulous
Punctuation in URLs
Hello, all.

Google's webmaster tools suite is reporting an ever-larger number (currently 22) of incorrect links pointing to my site, and most of them originate from the Hydrogenaudio forums. I think the problem is that the forum software is trying to automatically turn plain (not marked up) text into URLs, and is accidentally including the punctuation after the URL ends, such as periods and end-parentheses or end-brackets. Because this sends users to my site (and I'm guessing that other webmasters are seeing the same thing) looking for a page that ends with ".html)" or ".html." instead of just ".html", the user gets a 404 "Not found" error page.

I've now tarted up my error page so that my site does its best to guess where the user is trying to get to, and offers them a valid link. So it's not so much a problem for me now, but it probably harms Hydrogenaudio's reputation in search engines, because the web crawlers will likely report that the forums contain dud links.

So I wondered whether there was a way to tweak the HA forum software so that it tested whether its best-guess at what constitutes a URL actually is valid or not.

I did search to see whether this had been mentioned before, but didn't find anything. If this has been covered here at length before now, please point me to the relevant thread and accept my apologies.

  • Yirkha
Punctuation in URLs
Well there is code in the forum software to automatically clean up URLs when they end with punctuation characters, but it works for trailing .,?! only.
Also it doesn't prevent something like ".html..." anyway.
