Yeah, yeah, the search engines are getting smarter about duplicate content... blah, blah, blah. It's no longer the problem it once was... yada, yada, yada. Google will get it all sorted out for you.

Whatevs.

I don't care how smart the search engines are, it's no excuse for laziness. Sure, a maid may clean up your living room for you, but that's no excuse to ask them to wipe your..., er, mouth, too.

The intelligence of the search engines is your fall back. Your back up girl. The friend you call when all your other friends are out of town or on dates.

In marriage terms, you may have a brilliantly smart spouse, but that doesn't give you an excuse to be a dumb-ass. In fact, brushing up on your IQ points might actually score you some points where it counts. Search engines are no different.

If the collective intelligence of the search engines failed overnight, where would you be? Because that's really the point, isn't it? You can't get by on someone else's ability to make you look smart. Sooner or later someone smarter than you will come along and, well, there will be no saving grace for you. You'll be shamed and put up for display for what you really are. It'll be like showing up to the first day of school, naked. (Or am I the only one that has that recurring dream?)

Ooooookay, let's move forward, shall we?

So, despite the all the intelligence Google can muster, it's still a good idea to fix your duplicate content problems. It will play a role in which pages the search engines spider and index, as well as how many they index. These are both critical to getting visitors to your site and presenting them to the best pages for their query.

Here are six easy ways to eliminate your pesky duplicate content problems:

Remove old files from your server

This isn't something that most people think about as a duplicate content solution, but its a pretty big one. Over the years a typical site goes through designs, re-designs, development, and re-development. Developers will often work with their new files on their own server then upload them to the client's server. Old files either get overwritten, or if the new developer changed file names or moved files around, the new files just get added to the pile while the old site files stay in place.

Now, if the new site is perfect, with no links pointing to the old file names, eventually the search engines will figure out which pages are the new, relevant pages for the site. But, as long as those old pages are in place, there is potential for duplicate content problems

If those old pages are already in the search engine index, the engines continue to spider those pages and possibly keep them in the index. If a new version of that page was created with a different file name, you now have duplicate content.

Why would the engines do this? Perhaps there is a stray link on the site that points to one or more of these old pages. Maybe there are external links pointing to these pages that keep the engines returning.

By deleting all old files, you'll be able to find lingering internal links and implement redirects to get visitors to the proper pages.

Fix broken links

Broken link checks are great for finding broken links, but unless your duplicate pages have been removed, such a check won't pick up on the duplicate content problem. Only after you remove those old files will you be able to identify and fix site links that go to these dead pages.

I recently worked on a client site that, after the re-development, they had hundreds of broken links. With dozens of old files on the server, we had to first figure out which files were the correct files and then remove the incorrect dupes. Each round of broken link checks gave us more links to fix and more files to remove.

I literally spent 5 hours fixing broken links on a site because of old files mixed with new files, and links pointing to old files that were still on the server. I have no doubt they'll see a lift in traffic and conversions from these fixes.

Link to www.version of your site.com

When working on sites, I often see a mixture of links pointing to www.site.com/page and site.com/page. While this gets the visitor to the same page with the same content, it creates a unique URL that the search engines can index.

Eventually the engines get this snafu figured out, but why wait? Get it fixed now so it will never be an issue. Go through all your site links and direct them to the www. version of your site.

While I suggest this fix strongly, I also recommend preventing the non www. URLs from being displayed. You can do this with your .htaccess file if your server supports them. You can also use Google Webmaster Tools to set your preferences to always use the www. version.

If I were me -- and I am -- I'd do all three of these options, just to stay on the safe side.

Use absolute links

Absolute links are links that contain the full URL of the page being linked to. A relative link only works on internal site links and uses the least amount of information needed to get the visitor to the destination.

Absolute link:
http://www.searchengineguide.com/stoney-degeyter/index.php.

Relative link:
stoney-degeyter/index.php

Relative links work just fine for navigating the visitors to the correct pages. But, if you don't have the www. redirect issues above implemented, relative links can sometimes pose duplicate content problems.

If someone comes to your site using the non-www. version (site.com) all relative links will, by nature, not include the www. This opens the door for the search engines to spider all your duplicate non-www. URLs.

Using relative links that include the www. in the URL prevents this from happening. It doesn't matter what URL the visitor used to get to your site, all remaining URLs will point to the correct www. version.

Use canonical tags

E-commerce sites have a special set of problems. You want to make your products available to your visitors through multiple navigation paths, but that often creates duplicate product pages based on the trail used to reach it. In this situation, my first solution is to create a master URL for each product and ensure that, regardless of the navigation path, that URL is the one displayed when the visitor reaches the page.

But, short of that, you have the option of using the canonical tag:

< link rel="canonical" href="http://www.example.com/proper-page.html" />

That little bad boy tells the search engines which page should be considered the "correct" versions. So if you have multiple, duplicate product pages, you can add this canonical tag pointing to the proper page and the search engines will, in theory, not index the duplicate pages.

Never link to /index.html

This is especially true of your home page, but can apply throughout your site. On your home page you have two options:

www.site.com/
www.site.com/index.html (or .asp, .php., etc.)

Make sure all links going to your home page link to www.site.com and not the other.

Same with subfolders.

www.site.com/subfolder/
www.site.com/subfolder/index.html

Both of these URLs will take you to the same page. Pick the one you want to use, and stick with it in all your internal site links.

Implementing ALL of these fixes may seem like duplicate content fix overkill, but most of them are so easy there is no reason not to. It takes a bit of time, but the certainty of eliminating all duplicate content problems is well worth it.

Google's pretty smart, but you're smarter. You know it makes mistakes, but let those be mistakes in analyzing someone else's site, not yours.


December 9, 2010





Stoney deGeyter is the President of Pole Position Marketing, a leading search engine optimization and marketing firm helping businesses grow since 1998. Stoney is a frequent speaker at website marketing conferences and has published hundreds of helpful SEO, SEM and small business articles.

If you'd like Stoney deGeyter to speak at your conference, seminar, workshop or provide in-house training to your team, contact him via his site or by phone at 866-685-3374.

Stoney pioneered the concept of Destination Search Engine Marketing which is the driving philosophy of how Pole Position Marketing helps clients expand their online presence and grow their businesses. Stoney is Associate Editor at Search Engine Guide and has written several SEO and SEM e-books including E-Marketing Performance; The Best Damn Web Marketing Checklist, Period!; Keyword Research and Selection, Destination Search Engine Marketing, and more.

Stoney has five wonderful children and spends his free time reviewing restaurants and other things to do in Canton, Ohio.





Comments(15)

Great suggestions & post. I have trouble with the recommendation of using absolute links - I completely agree that it is effective to prevent duplicate content, especially when it comes to https vs http and www. versus non www. versions of the URLs...but developers HATE absolute URLs. Every time I make this recommendation the developers literally want to destroy me, and will come up with a million reasons why they won't do it - mostly revolving around it being inefficient, bloating the code, slowing down load times, etc.

Do you have any good rebuttals to developers that don't want to use absolute URLs?

Great post on duplicate content! I was wondering regarding the last point if www.site.com/ or www.site.com(w/o the trailing back slash). do make a difference for search engines

Since the browser includes the / after the domain, users a more like to copy and paste and link using www.site.com/.

Good stuff. In other words, listen to mum and don't be messy :) I think you also indirectly make a pretty strong argument for using a CMS.

Now help me out here a bit (because I've honestly never done what I'm about to propose, and as a matter of fact, I just thought of it). (1) Let's presume for a minute most if not all the older pages still have their (Google) Analytics snippet installed. (2) Let's also presume that the URL structure of the new site can be differentiated (via regular expression) from the older site(s).

If both of these presumptions are true you can then use filter(s) to strip out all the new traffic, and thus pin point which of the old files are still drawing traffic. (Note: In theory filters might not be necessary. I just think it might be easier to get rid of the new hay and leave the old needles, eh?) Since traffic could be from Digg, another blog, etc.and not necessarily (organic) search this approach would actually be a more thorough solution, I think? You could then either try to contact the referring sites, or just set up redirects to handle the older traffic.

This is doable, right?

btw, I don't want to subscribe to the thread. Too much email already, ya know? But if you can email me direct and let me know what you think, that would be great. Thx.

But what I heard yesterday from Blog is that google first have to crawl & study the code to understand that there is canonical tag whereas if we used 301 redirect which can be helpful and works immediately. I tried that canonical but after crawling my page also google don't index my site with www, therefore i used 301 which works fine.

Nick, I know what you mean. I hate to say it but developers are often the biggest hindrance to good online marketing. Of course, they can also be the best friend to online marketers as well, but they have to be on board with what it takes for a site to succeed.

So my response to these developers is to laugh at them and tell'em to suck it up. Code bloat? really? That's the best they can do? hahahaha. Classic!

Seriously, sometimes a little extra code is needed. There are so many problems that can be fixed by using absolute URLs (at least in the navigation) that it trumps everything else.

Contrad - Actually, yeah, the search engines can see those as two different URLs, however the trailing slash generally automatically get's added it's not a problem.

Mark - CMS are often the culprit for causing duplicate content issues. I have no problem with them but it's all in the programming. As for the rest, that'll take more time to think through than what I have now.

Hyder - 301 is always my preferred method. The others solutions are backups, really.

What do you propose for international websites with same contents like data products and services ? That make sense when I have a site.co.uk, an other one site.com and the third site.ch based on US language.

This is such a relevant post with the current trend for websites to go away from html to more databased technologies. And I totally subscribe to the comments about developers! Their job is to find ways of making the site work quickly and for users to find it easily and to navigate it -period!

M

Laurent, I'd suggest writing for your audience. Even though the language is the same, there will be, at the very least, minor changes in wording and presentation that targets the English reading audience in each country. If not, then I'm not entirely sure what the point of having a website for each country is.

Stoney: Isn't it better to link to the www.site.com than to www.site.com/

Or, I guess it really doesn't matter so long as you are consistent in linking the same one. I just think most people will link without the / on the end.

Curious why you recommend pointing to the address with www "www.domain.com" instead of just leaving it off "domain.com"? As users have gotten used to Internet terminology we realized we did not need the "http://", an email address is recongnized on its own just by the presence of an @ symbol. I think we can really drop the leading "e" before things like e-mail, e-commerce, etc. Its just mail and commerce now. Seems like we are past the requirement of the www designation or do you disagree? Thanks.

Brad, the www. is certainly not needed and can be dropped. In fact, it doesn't really matter if you default to use it or not, it's just a matter of being consistent.

Anthony, Consistency is key. But since the browser automatically redirects to using the "/" at the end then I'd go with that.

I'm having a problem with absolute links.

Let's say the absolute link to one of my pages is www.abclink. com / linkingproblems.html. (I don't want this message to get caught in the filter so I put extraneous spaces in the url that are not included in my real links).

When I clink on the absolute link, my web address appears twice and the link doesn't work. For example:

www.abclink. com/www.abc. com/ linkingproblems.html (again I used extraneous spaces).

What's the fix for this?

The only time I've seen this happen is when I forget to place the "http://" before the "www."

Comments closed after 30 days to combat spam.


Search Engine Guide > Stoney deGeyter > Six Easy Ways to Eliminate Pesky Duplicate Content