There are two kinds of duplicate content: content that is duplicated on multiple websites and content that is duplicated on multiple pages of a single site. I believe the search engines treat each differently and, of course, there may be different standards applied to duplicate content within each of these two main differentiations, depending on the cause and instance.

Please note that I've not done any in-depth testing of this issue, so everything I'm presenting here are my own theories. But I think as far as untested theories go, they're pretty solid.

Multiple Site Duplicate Content

Let's first tackle the issue of content duplicated on different sites across the web. Within this segment of duplicate content there are two obvious types; duplicated articles (or other types of lengthier content) and duplicate product descriptions.

Product Descriptions:

Many ecommerce sites use nothing more than the default manufacturers descriptions to populate their content pages. They might throw in a custom title and description, but many times the description info is left intact. When this is the case, how do the search engine determine the relevance of one site's product over another?

In these instances, I believe the weight of the site itself, and the overall number and quality of backlinks, tend to be the primary factor. Given similar content with another site, a site that is more well-known, has a larger user audience, and a better backlink structure, is likely to trump any other website.

On the other hand, a site that provides unique product descriptions suddenly has an advantage. Links and popularity will still come into play here, so the big site with dupe content may still achieve better rankings. But the site with unique content will undoubtedly outperform sites that are on the same "stature" level, and perhaps sites that are one or two rungs higher in stature, assuming, of course, that the higher stature sites are using duplicate product descriptions.

The search engines should favor the sites that take the time to develop unique content, over those that don't, barring any other factors that might come into play.

Article Content:

The other type of duplicate content between different websites is longer content or articles. This comes into play with article distribution sites, scraper sites or other blogs that are republishing content. Some of this is against the content originator's will, but not always. I'll hold no distinction between the two here.

I don't have a firm theory on what actually happens in these cases, but I agree with many others who have spoken on this as to what the search engines should be trying to do. It would seem that it would not be too difficult to find the original, or canonical, version of any piece of content. They can do this a couple of ways which would probably identify the originator of 90% of all duplicate content.

One way is to simply look at the cache date. If they cached the content on site X first, then when it appears on site Y, they know it's duplicated. This way, of course, assumes that the originator always gets cached first, which is not always the case.

A second way is to look for an author's name, or a link that points back to the author's site. If I republish this piece on another website (as this piece is, already having been on my business blog), I'll have a link back to my site in my author's bio (as this article does). The search engines can simply look for this link and if it goes back to a site that does in fact contain this "duplicated" information, then the engines can know which is the original version.

I suspect that they employ both of these methods, as well as others that I have not mentioned here, to make their determination on which information is canonical. While the second method does not address stolen content, the first method will most likely be able to be used to determine the content originator.

A couple of years ago I asked a question on this topic, in regard to passing link value to originating sources, to a group of search engine engineers. I never received a satisfactory answer.

My question was that if there are two pieces of identical content and the search engines clearly know which one came first, do links pointing to the duplicated version count as links to the original version? Part of the answer here would be obvious, in that if the duplicated version contains a link back to the first, then the first will get some second-hand link value. But I wanted to know about the passing of first-hand link value. Much to my disappointment, the search engineers refused to answer my question.

What I would like to see, is that in cases when the search engines are confident of the canonical version of a piece of text, that all links to the duplicates should (at least in part) be attributed back to the original. The originating source should get the lion's share of the link value, despite where that content is duplicated. This would allow the original content to gain more traction against duplicates on sites that have significantly more power and weight.

In-Site Duplicate Content

Again, in-site duplicate content is content that is duplicated on one or more pages within a single site. I think this type of duplication is that which is most prone to receiving any kind of penalty from the search engines. But penalty might not be the best word to use in most cases. I think what happens here is that search engines simply treat your site differently than they would if it didn't have any significant duplicate content problems.

Product Pages

The type of in-site duplicate content that most often appears in ecommerce sites, is when the same product is given multiple URLs, depending on the navigation path. I've seen sites that create up to three URLs for every single product page. This type of duplication poses a real problem for the engines. A 5,000 product site suddenly becomes a 15,000 product site. But as the search engines spider and analyze, they realize that they have 10,000 too many pages in their index due to the duplication of the product pages to different URLs.

When this type of duplication is found, the search engines will often slow down or even stop spidering your site. The duplication has created an undue burden on the engines and since they are not getting much in the way of new content (in relation to pages being spidered), they have no compulsion to continue. This leaves many pages of your website out of reach of the search results.

Such duplication also leaves you open to splitting link value between multiple URLs. If someone links to a particular product, they may link to any of the multiple versions, instead of a single primary version/URL. This can cause the search engines to give weight to the "wrong" URLs.

Many people will "fix" this problem by preventing the search engines from indexing all but a single version of the content. Keep in mind though, that this only keeps the duplicate pages out of the search index, but does nothing about the link splitting issue. As long as those duplicate URLs exist, link value splitting will be an issue.

The best solution here is to find a way to resolve the duplicate content issue all together. Don't let the navigation path determine the URL for any given product. Let there be only one URL for each product, regardless of how the visitor navigated, or how many categories the product fits in.

Product Summaries:

Another duplicate content issue is when short product description summaries are being displayed throughout multiple category type pages. Let's say you are looking for a Burton Snowboard. You click on the Burton products link and then click on snowboards. This leads to a page with various Burton snowboards, each displaying a short product description/summary. But when you then navigate to the main snowboards page, which carries products from Burton and other companies, you find the same Burton product descriptions along with duplicate product descriptions for all the other products as well.

I'm not entirely certain how the search engines react to this kind of duplication, but I don't imagine that they would give the page a whole lot of weight. A solution is to make sure that each of these pages has at least a single paragraph of unique content. This way the search engines can freely choose to ignore the obvious duplicate product descriptions, but still have something of value on the page worthy of being indexed and followed.

There is a time for (cautious) duplication

When analyzing the value or necessity of duplicate content, you have to evaluate your goals. I frequently allow articles from my business blog to be reprinted/reposted/duplicated on other sites. I know I'm creating duplicate content, but my purpose for doing so is exposure. Some of these other sites give me far more exposure than I get on my own blog.

You can probably argue that if I never duplicated content, then my blog would get a lot more exposure than it does now, and I'll concede that possibility. But I also know that it would take significant more exposure to match that which I get from duplicating the other sites. So in this case I'm willing to live with the duplicate content issues that may arise. But, I also take measures to make sure the engines know where the content originated.
The thing to keep in mind with all of this, is that search engines want unique content. It does them no good to serve up ten pages with exactly the same content. So anything you can do to make each of your pages unique, the better off you'll be.

This article is part of a series on duplicate content. Follow the links below to read more:

  1. Theories in Duplicate Content Penalties
  2. How Poor Product Categorization Creates Duplicate Content and Frustrates Your Shoppers
  3. Redirecting Alternate Domains to Prevent Duplicate Content
  4. Preventing Secure & Non-Secure Site Duplication
  5. Why Session ID's And Search Engines Don't Get Along (Hint: It's a Duplicate Content Thing)
  6. What Does a Title Tag, Title Tag and Title Tag Have In Common?
  7. How to Create Printer Friendly Pages Without Creating Duplicate Content
  8. How to Use Your WWW. to Prevent Duplicate Content

May 1, 2008





Stoney deGeyter is the President of Pole Position Marketing, a leading search engine optimization and marketing firm helping businesses grow since 1998. Stoney is a frequent speaker at website marketing conferences and has published hundreds of helpful SEO, SEM and small business articles.

If you'd like Stoney deGeyter to speak at your conference, seminar, workshop or provide in-house training to your team, contact him via his site or by phone at 866-685-3374.

Stoney pioneered the concept of Destination Search Engine Marketing which is the driving philosophy of how Pole Position Marketing helps clients expand their online presence and grow their businesses. Stoney is Associate Editor at Search Engine Guide and has written several SEO and SEM e-books including E-Marketing Performance; The Best Damn Web Marketing Checklist, Period!; Keyword Research and Selection, Destination Search Engine Marketing, and more.

Stoney has five wonderful children and spends his free time reviewing restaurants and other things to do in Canton, Ohio.





Comments(17)

Stoney, another smash hit article! Thanks for sharing your thoughts on duplicate content. I couldn't agree more with the ideas presented here.

Very good article! I definitely agree with what you are doing for exposure. This will definitely help those people looking to improve their sites/blogs by pushing them to the point where they take the time to write a unique article instead of posting free ones up.

Thanks, Stoney, good article. My question is this: do I get penalized for duplicate content when I am testing versions of the same landing page with minor variations for PPC purposes? I probably have 5-6 different version of my home page right now that I am testing for price, keywords etc. Any insight would be appreciated. Thanks.

Hi Jill, Great question, and fortunately there is an easy solution.

If the search engines can spider all your PPC landing then you are opening yourself to duplicate content. It's doubtful you'll be penalized in any other other than having some of your dupe content dropped from the index. This means that the "wrong" pages may stay in the index with the "right" ones dropped out.

The solution is to excluded all of your PPC landing pages from being indexed. You can do this a couple of ways.

1) use the robots meta tag to tell the engines to noindex the page:

Place that (minus the space between " tag of the pages.

2) Use the robots.txt file to exclude those pages:

User-agent: *
Disallow: /ppclandingpage1.htm
Disallow: /ppclandingpage2.htm
Disallow: /ppclandingpage3.htm
etc.

Even easier, you can also place all your landing pages in a single folder and exclude that folder

User-agent: *
Disallow: /ads/

Hope this helps.

I would like your opinion on multiple websites with identical content. I have 6 sites that all have the same content, except for some minor changes on the home pages.

I have had extreme problems in the last year getting any rankings above page 5 on Google for 4 of the 6 sites.

Do you think the identical content on the 6 sites is causing me problems with search engine rankings?

Hi Jeff,

This kind of duplication will absolutely cause you problems. If they search engines haven't figured it out all ready when they do they'll start dropping your sites our of the index completely. You're lucking that you can even get to page five with as many duplicates that you have.

Look at it from the search engine's perspective, why would they want to put six sites on the first page that all say the same thing, with a few minor differences? How does that give value to the searchers? It doesn't. Your best efforts would be to 301 redirect all the duplicate sites to one and promote the heck out of that one. Use it to earn your spot in the first page. Whatever you do, don't keep trying to "fix" the results in your favor.

Hope this helps.

Hye
thank you for this really interesting article !! can you give some more advices about how to manage duplicate content on ecommerce site (the case when we have multiple path between each product description) ...

Do you think it is a good solution to do 301 redirections between each path (is it possible ??)

What is the best solution ?

thank you again !

Good post! I was under the impression that Google even penalies you for having duplicate content. But that is not the case?

Hi Stoney,

This article made me think how well we laid out all products in our website. This is very interesting.

We have laid out our products in categories and each product have an html page (solely for that product alone). However, we found out that one product can be laid on two categories as well. That made us laid one (1) product to three html page; two for different categories and one solely for itself. They do have the same contents however.

Example:

www.name.com/cooking.html
www.name.com/kitchen.html
www.name.com/kitchen-rice-cooker.html

The last html page is solely for that product only i.e. rice cooker.
The first and second html page are category pages with a series of product related to that category. I included product rice cooker in the two categories with the same content.

When a visitor click product rice cooker in cooking.html, they will be brought to its one html page so do with the kitchen.html. Do you think this will penalize my website because of such duplication?

Hi Stoney,
I think this article will be really helpful to me.
I found out that there are 9 different websites which have the same content of my home page. This is really big problem to my website ?

Yes, if there are other sites with the same content that will be a significant problem that should be resolved.

I guess it all depends on what you consider a penalty. It's known that the search engines filter duplicate pages from the index, that's not a penalty, it's a filter. But they might also slow the crawl of a site or reduce the number of pages it'll index. That's not really a penalty either but it can be a big problem.

But duplication on a mass scale can happen very easily with database driven sites. Will the site get penalized? Probably not, but the filters are enough to cause significant damage to long-term business growth.

Thanks for a great article, Stoney! Exactly the info I was looking for.

It seems inevitable that there will be certain amount of duplication on sites, on mizpah.tv we try to keep it to a minimum, but only by seeing the algorithm can we know for sure.

What a great article. But what I don't like is how google penalizes for this. For example we have over a thousand products and they come and go. If I were to spend the time writing unique content, but the time I was finished, half of those products would be discontinued. I'm thinking then about rewriting all of our content and in the meantime using the robots.txt file to disallow google from our directory where the products are! Thanks again!

I do occasionally publish my articles on isnare and ezine articles? Should I stop doing this?

@ biz - There are benefits to this so you have to weigh pros and cons. I spent a few years publishing articles on my site and then after they had aged several weeks I'd go ahead and publish them on sites like ezine and others. I got the benefit of the broad distribution, however the original on my blog was the first known version out there.

Comments closed after 30 days to combat spam.


Search Engine Guide > Stoney deGeyter > Theories in Duplicate Content Penalties