Large scale Web sites with thousands of Web pages (or in some cases millions of Web pages) face different SEO (define) problems than smaller Web sites. Given their scale, large Web sites often require a Content Management System (CMS).
Early on, developers who designed and implemented CMS software didn't have much SEO education. The resulting problems for major corporations attempting SEO after CMS implementation were severe. While developers' knowledge of SEO has improved, serious SEO issues resulting from CMS software still need to be addressed.
One of the biggest: CMS systems create duplicate content. Fundamentally, each and every page of your site must be accessible only from a single URL.
Lone Default Document?
My favorite version of this problem: the CMS will often refer to the home page of the site using the default document: i.e. www.yourdomain.com/index.html. The CMS often ends up using this form of the URL for all the links on the site that point back to the home page. The problem is both forms of the URL resolve separately, resulting in duplicate content.
We see this in 70 percent to 80 percent of Web sites when we first look at them. Go look at 10 sites that have not had a professional SEOs touch. You'll see it. Discovering it is easy – just click on one of the links to the home page on the site and see what the URL is for the page you end up on.
This problem is everywhere, and it's a big problem. All your internal link juice gets voted for the index.html version of the page, and most likely nearly all of your inbound links will point to the domain itself without specifying the default document. You end up dividing that link juice in a really bad way.
The best way to fix this? Fix it before you start your site (i.e. never implement it that way). Once your site has been out there, it becomes a bit messier because perhaps some of your inbound links point to the default document, and you're going to want to salvage that link juice for the site.
Unfortunately, it's not as easy as 301 redirecting the default document to www.yourdomain.com because most Web server software won't let you redirect the default document to the domain name by itself. When the Web server attempts to retrieve www.yourdomain.com it does so by looking up the default document, which then redirects it back to just the domain name. It then tries to retrieve the default document. You get the idea.
To truly make this problem go away, you need to modify your CMS to server setup to change the default document to another file name. Then you can redirect the old (no longer) default document back to www.yourdomain.com and preserve the benefit of those legacy links to that page. This is more than a little difficult and not a task for someone other than an experienced Web developer.
Automatic Home Page Redirects
There is a closely related problem that is a bit rarer. With some CMS software the home page www.yourdomain.com is redirected to a different page, such as www.yourdomain.com/en. You might see this on sites that offer multiple language versions.
First of all, the default redirect used in many CMS software packages is a 302 redirect. I'll avoid the full rant about 302 redirects because I'll miss my writing deadline if I get started on that one. The short form is that this may simply block the passing of any link juice into the site altogether. Ouch.
It's obviously better if you use a 301 redirect. However, we all need to be aware that even a 301 redirect is not a perfect solution. It's fair to say that it works the great majority of the time --but not 100 percent of the time. Frankly, relaying on the search engines to help clean up your CMS mess is not my idea of a great business plan.
For both scenarios, the right answer is to address them during the design phase of a site. It's another example of why it's so important to begin the SEO effort before a site is built.
It's like the old carpenter's saying: "If you don't have time to do it right, you better plan on allocating some time to do it over." And in the world of search engines, doing it over will take far longer than doing it right up front.
That ends up being time you spend fixing your site instead of time you spend growing your site.