Duplicate content comes in many flavors and varieties. Sometimes companies I've spoken with have no idea what they're doing is wrong. In fact, that's the case most times. Before I get into the "what" regarding duplicate content, let's first discuss the "why" of duplicate content.
Remember how much trouble you would get into at school if you turned in a paper that looked plagiarized? Well, as we knew then it's a bad idea to copy someone's work. Since search engines try to emulate human behavior (you've heard me say this before), they too dislike copycats. The search engines want to give credit for the original work to those people/Web sites that originated the content.
Now for the what: types of duplicate content.
Scraper Web Sites
There are many Web geeks out there who will build numerous Web sites, load them with automated content and run what many in our industry refer to as "Made for AdSense" (MFA) Web sites. MFA Web sites are about finding any tactic necessary to get these Web sites to rank for search phrases which, in most cases, are costly on a cost per click basis.
These sites generate revenue through a content advertising partnership with Google to sell advertising on their Web sites. Since they are often run by one person, these sites rarely offer unique content for all of their sites. So, they scrape (steal) content from other sources to post on their own Web site to increase their sites' ranking for specific keywords. Most search engines have put in filters to look for instances of borrowed/stolen content so the search engine (ideally) rank the Web site that originated the content.
Inadvertent URL Issues
This is the most common form of duplicate content. Because of how common this is, the search engines do their best not to penalize for this type of duplicate content. The best strategy: always correct these issues. URL issues commonly occur when a company owns several domains and just posts the same version of their Web site on multiple domains. I can't tell you how many times I've had clients say to me: "That's not duplicate content." Well, in the eyes of a search engine (for the reasons mentioned above), this qualifies as duplicate content.
Search engines are very literal. They don't necessarily know you own all of these domains. I've seen instances when I corrected these issues (and have done nothing else) and seen rankings increase. Besides, if you redirect all of your domains (301 redirect), you will be passing along any value these other domains may have had to the one domain you have decided to focus on.
I've also seen instances where someone has tried to use a URL rewriting program (ISAPI Rewrite/Mod Rewrite) and unintentionally ended up with two versions of their URLs (the rewritten version AND the old version). Again, work with your SEO firm and determine which URL is the right one to keep long-term and redirect any others to the keeper.
Same content applies to several categories within a Web site
We have a client which has several services that can be found under several categories within their Web site. It's a fact. The content does apply to all categories. However, we have requested they modify the content so it is unique for each page. It doesn't have to take a lot of effort. At a minimum, rewrite the first paragraph and ensure the title tags are unique. You should be in good shape.
This form of duplicate content isn't really "bad." It is what it is. The search engines will give credit to the version of the press release they deem the original source of the content. So, in most cases, the "originator" is PRWeb, Business Wire, or whichever distribution/syndication tool you might be using.
Proper search engine optimization is much more than tossing title tags, META descriptions and META tags into a Web site. Issues such as duplicate content (intentional or otherwise) can have a profound impact on your search success. You can easily address simple issues like duplicate content that will make or break a campaign. As I tell my team each month, "success is in the details!"