There is no better way to create an infinite amount of duplicate content on your site than to force session IDs onto each visitor (and search engine). Typically, session IDs are used for tracking a single visitor's navigation path through the site, including the adding or removing products from the shopping cart. They are great for tracking purposes, but really, really bad for search engines and inbound linking.

Session IDs

Ok, first of all, that's a totally crappy URL shown above, but aside from that, tacked on at the end there is the session ID. Both URLs are the same, all except the session ID. I was able to open the exact same page, with the unique ID simply by starting a new browsers session. The problem is that the session ID constitutes a completely different URL. It's not an issue for the visitor, but it is for the search engines.

Since a new session ID is attached with each new visit, each time the search engine comes around they are essentially fed all new URLs. If you have only a ten page site, the second time the search engines visit they add the "new" 10 pages to the index, for a total of 20 pages. When they come around a third time they now have 30 pages in their index. Once they start analyzing these pages they find page after page after page of duplication.

An additional problem arises as site visitors start bookmarking and linking to your site. Every link they add contains their very own session ID. The search engines follow that link to your site and now you've got another 10 pages of duplication. If they follow another link to your site, that's 10 more. You starting to see where this is going? Essentially you can turn a 10 page site into endless duplications.

Session ID Duplicates

Even with a small site you can see why the search engines would stop coming around. But if you have a site with hundreds, or even thousands of products, you find two things happen. 1) The search engines will stop spidering new pages because there is just too much duplication. 2) The engines will start dropping pages out of the index altogether.

There are content management systems that will allow you to withhold the session IDs from search engines. While this is a good option it still has the potential of creating problems with inbound links. Each link will still pass value to the URL with the session. It'll be up to the search engine to make a determination if the URL with the session and the URL without are the same.

The only guaranteed protection is not to do it at all. There are alternate means of tracking users for whatever reason. Avoiding session IDs completely ensures that you don't open yourself up to inadvertent site duplication.

This article is part of a series on duplicate content. Follow the links below to read more:

  1. Theories in Duplicate Content Penalties
  2. How Poor Product Categorization Creates Duplicate Content and Frustrates Your Shoppers
  3. Redirecting Alternate Domains to Prevent Duplicate Content
  4. Preventing Secure & Non-Secure Site Duplication
  5. Why Session ID's And Search Engines Don't Get Along (Hint: It's a Duplicate Content Thing)
  6. What Does a Title Tag, Title Tag and Title Tag Have In Common?
  7. How to Create Printer Friendly Pages Without Creating Duplicate Content
  8. How to Use Your WWW. to Prevent Duplicate Content

May 8, 2008





Stoney deGeyter is the President of Pole Position Marketing, a leading search engine optimization and marketing firm helping businesses grow since 1998. Stoney is a frequent speaker at website marketing conferences and has published hundreds of helpful SEO, SEM and small business articles.

If you'd like Stoney deGeyter to speak at your conference, seminar, workshop or provide in-house training to your team, contact him via his site or by phone at 866-685-3374.

Stoney pioneered the concept of Destination Search Engine Marketing which is the driving philosophy of how Pole Position Marketing helps clients expand their online presence and grow their businesses. Stoney is Associate Editor at Search Engine Guide and has written several SEO and SEM e-books including E-Marketing Performance; The Best Damn Web Marketing Checklist, Period!; Keyword Research and Selection, Destination Search Engine Marketing, and more.

Stoney has five wonderful children and spends his free time reviewing restaurants and other things to do in Canton, Ohio.





Comments(6)

How do you know if Google has indexed content more than twice? I use session IDs and did a site:domain.com search in Google but it appears that Google has picked up all pages as separate content.

If you perform a site:domain.com check and you don't see any of the same pages with different session IDs then at least for now you're in the clear. But I still wouldn't use that as a reason not to make changes if you can. You're still putting the search engine in a place to have to figure this stuff out for themselves, and while they may be doing it right at the moment, they may not always.

What are the alternate means of tracking users? Cookies? How does that work exactly? Can you point me in the direction of some resources?

Thanks... Great Article!

Now this is a question on how you deal with session management for the application (either get compromised with cookies and make search engines happy (or) sessionIds and expect the search engine to have the head ache of understanding the issue and dealing with duplication). Ideally search engines should be expected to be intellegent and ya business efficiency should not be compromised for a third party search engine???

Session ID shows up in the URL only if the method of the submitted form is GET, i.e., <form method="get"...>. If you can arrange for the form method to be POST, this particular problem does not arise. Data-transmission paths to the host differ between GET and POST. The latter, as well as being somewhat more secure, completely sidesteps the issue of fake URLs and SE confusion.

Good article, I'd never thought of this problem before.

Correct, but you still have a problem, if the reason your using a session ID is for security and user experience, which means you cannot split the GET and POST.

The POST cookie ID is usually more secure, if the ID is being used for long term access privilidges, but then your forcing the use of cookies unnecessarily.

Irony in detail :)

You can use parameter settings (search engine specific) and canonical tags (search engine variances in how handled) and redirects to limit the effects.

Comments closed after 30 days to combat spam.


Search Engine Guide > Stoney deGeyter > Why Session ID's And Search Engines Don't Get Along (Hint: It's a Duplicate Content Thing)