There is no better way to create an infinite amount of duplicate content on your site than to force session IDs onto each visitor (and search engine). Typically, session IDs are used for tracking a single visitor's navigation path through the site, including the adding or removing products from the shopping cart. They are great for tracking purposes, but really, really bad for search engines and inbound linking.

Session IDs

Ok, first of all, that's a totally crappy URL shown above, but aside from that, tacked on at the end there is the session ID. Both URLs are the same, all except the session ID. I was able to open the exact same page, with the unique ID simply by starting a new browsers session. The problem is that the session ID constitutes a completely different URL. It's not an issue for the visitor, but it is for the search engines.

Since a new session ID is attached with each new visit, each time the search engine comes around they are essentially fed all new URLs. If you have only a ten page site, the second time the search engines visit they add the "new" 10 pages to the index, for a total of 20 pages. When they come around a third time they now have 30 pages in their index. Once they start analyzing these pages they find page after page after page of duplication.

An additional problem arises as site visitors start bookmarking and linking to your site. Every link they add contains their very own session ID. The search engines follow that link to your site and now you've got another 10 pages of duplication. If they follow another link to your site, that's 10 more. You starting to see where this is going? Essentially you can turn a 10 page site into endless duplications.

Session ID Duplicates

Even with a small site you can see why the search engines would stop coming around. But if you have a site with hundreds, or even thousands of products, you find two things happen. 1) The search engines will stop spidering new pages because there is just too much duplication. 2) The engines will start dropping pages out of the index altogether.

There are content management systems that will allow you to withhold the session IDs from search engines. While this is a good option it still has the potential of creating problems with inbound links. Each link will still pass value to the URL with the session. It'll be up to the search engine to make a determination if the URL with the session and the URL without are the same.

The only guaranteed protection is not to do it at all. There are alternate means of tracking users for whatever reason. Avoiding session IDs completely ensures that you don't open yourself up to inadvertent site duplication.

This article is part of a series on duplicate content. Follow the links below to read more:

  1. Theories in Duplicate Content Penalties
  2. How Poor Product Categorization Creates Duplicate Content and Frustrates Your Shoppers
  3. Redirecting Alternate Domains to Prevent Duplicate Content
  4. Preventing Secure & Non-Secure Site Duplication
  5. Why Session ID's And Search Engines Don't Get Along (Hint: It's a Duplicate Content Thing)
  6. What Does a Title Tag, Title Tag and Title Tag Have In Common?
  7. How to Create Printer Friendly Pages Without Creating Duplicate Content
  8. How to Use Your WWW. to Prevent Duplicate Content



Comments (5)

How do you know if Google has indexed content more than twice? I use session IDs and did a site:domain.com search in Google but it appears that Google has picked up all pages as separate content.

If you perform a site:domain.com check and you don't see any of the same pages with different session IDs then at least for now you're in the clear. But I still wouldn't use that as a reason not to make changes if you can. You're still putting the search engine in a place to have to figure this stuff out for themselves, and while they may be doing it right at the moment, they may not always.

What are the alternate means of tracking users? Cookies? How does that work exactly? Can you point me in the direction of some resources?

Thanks... Great Article!

Now this is a question on how you deal with session management for the application (either get compromised with cookies and make search engines happy (or) sessionIds and expect the search engine to have the head ache of understanding the issue and dealing with duplication). Ideally search engines should be expected to be intellegent and ya business efficiency should not be compromised for a third party search engine???

Session ID shows up in the URL only if the method of the submitted form is GET, i.e., <form method="get"...>. If you can arrange for the form method to be POST, this particular problem does not arise. Data-transmission paths to the host differ between GET and POST. The latter, as well as being somewhat more secure, completely sidesteps the issue of fake URLs and SE confusion.

Good article, I'd never thought of this problem before.

Leave a comment

 



If you'd prefer, you can also subscribe without commenting by submitting your email address here:



About the Author

Stoney deGeyter founded Pole Position Marketing in 1998 working from a home office and has since turned it into a leading search engine marketing business with a small team of seasoned Reno SEO and marketing experts. Stoney pioneered the concept of Destination Search Engine Marketing which is the driving philosophy on how Pole Position marketing helps their clients expand their online presence and improve online conversion rates.

Stoney is a moderator at the Small Business Ideas Forum, a regular contributor to the Search Engine Guide blog and has a monthly column on Search Engine Land. He posts his SEO and business insights at the E-Marketing Performance blog where you can also find his e-books: E-Marketing Performance: Effective Strategies for Building, Optimizing and Marketing your Website Online and Keyword Research and Selection: The Definitive Guide to Gathering, Sorting and Organizing your Keywords into a High-Performance SEO Campaign.

Stoney is married with five wonderful children and, if away from the computer long enough, enjoys riding his dirt bike, watching DVDs, reading books and spending quality and quantity time with the family.

Stoney deGeyter founded Pole Position Marketing in 1998 working from a home office and has since turned it into a leading search engine marketing business with a small team of seasoned Reno SEO and marketing experts. Stoney pioneered the concept of Destination Search Engine Marketing which is the driving philosophy on how Pole Position marketing helps their clients expand their online presence and improve online conversion rates.

Stoney is a moderator at the Small Business Ideas Forum, a regular contributor to the Search Engine Guide blog and has a monthly column on Search Engine Land. He posts his SEO and business insights at the E-Marketing Performance blog where you can also find his e-books: E-Marketing Performance: Effective Strategies for Building, Optimizing and Marketing your Website Online and Keyword Research and Selection: The Definitive Guide to Gathering, Sorting and Organizing your Keywords into a High-Performance SEO Campaign.

Stoney is married with five wonderful children and, if away from the computer long enough, enjoys riding his dirt bike, watching DVDs, reading books and spending quality and quantity time with the family.