One of the first decisions you make when putting up a new website is what content management system (CMS) you're going to use. There are plenty of options -- WordPress, Drupal, Joomla, MovableType, developing your own custom solution, etc.
The right choice allows you to update your site as easily as possible, depending on your level of comfort with web publishing. With the right choice you can make the look and feel of your site match what you truly want.
But is your CMS of choice actually working the way you expect it to? Is it possible that your CMS can hurt you? Leaving aside the security issues that we seem to frequently hear about, such as XSS script injection, how can this be?
A best practice in SEO is to include keywords in your URL. Let's look at an example.
That URL tells you and the search engines what the story's about. It's an everyday story about a man wearing a chicken suit who rode his bicycle into his local supermarket with calamitous results.
But what if the CMS only really cares about the article ID in that URL? If the 3554410 is all that it uses to uniquely identify the article, and the text after it is completely superfluous, what could happen? Well, it could be changed to something like the following and would still resolve to the same article:
Or how about:
When you look at the code, there's a big difference between these two. While both allow for any text to resolve for an article as long as the unique key is intact in the URL, the Washington Post articles at least have canonical tags, so they're telling the search engines that when they index the page that there is indeed a preferred URL that they should use. The Sun doesn't use canonical tags.
Now, this is kind of amusing, but is it really something that you need to worry about? Most likely not with the sites I've outlined above, as they'll be passing enough authority to the "correct" URL so that search engines will correctly identify, when they detect another page on the site with duplicate content.
But let's say that you have a site without as much authority, and someone maliciously decides to put a campaign together that generates links to an incorrect form of the URL? You could find yourself with a PR issue when the URL is noticed.
For example, two weeks ago the UK's Independent newspaper ran a fluff piece about a jelly bean with the image of Kate Middleton on it. Someone went and changed the URL and forwarded their expletive laden version around the web. This then went viral, with some people jumping to the conclusion that this URL had been posted deliberately by a disgruntled member of staff.
With the social buzz, and the blog posts written about it, this user generated URL quickly rode up the rankings, becoming the URL that Google showed for queries about the Kate Middleton jelly bean. In fact, until they realized and corrected it, it also became the version that was returned when users searched using the Independent's own site search, according to the Daily Mail.
Not only do some CMSs allow users to modify article text, they also allow users to place text directly on the site. I'm sure that the site below wouldn't want to have this section start ranking in searches for their site, yet it's possible that it could happen.
Looking at the code, it does have a canonical tag, unfortunately...
So what should a site do to combat this, should their CMS allow something such as the issues listed above?
Canonical tags can help, assuming that the canonical tags are implemented correctly unlike in the last example. You could change it so that the article text is part of the unique identifier that the CMS looks for, although that may limit your flexibility for updating the URL based on an evolving story. But probably the best way to handle it is to fix it in the CMS so that any incoming URLs are automatically 301'd to the canonical form, as the Independent now does.