What is Duplicate Content?
Duplicate content generally refers to substantive blocks of content within or
across domains that either completely match other content or are appreciably similar. Most of the time when we see this, it’s unintentional or at least not malicious in origin: forums that generate both regular and stripped-down mobile-targeted pages, store items shown (and — worse yet — linked) via multiple distinct URLs, and so on. In some cases, content is duplicated across domains in an attempt to manipulate search engine rankings or garner more traffic via popular or long-tail queries.
What is not considered Duplicate Content?
Translations are not considered as duplicate content by Google.
What Does Google suggests?
Understand your CMS: Make sure you’re familiar with how content is displayed on your Web site, particularly if it includes a blog, a forum, or related system that often shows the same content in multiple formats.
If you ever get tainted to use article distribution as a technique for building inbound links to your website forget about it. For many, this strategy has run into problems because of the Google duplicate content filter.
Wordpress has a big problem when it comes to cloning the content.
Normally the number of pages indexed by google should be equal with the number of posts+ the number of pages. Wordpress creates pagination, search, trackbacks, author, categories, archieves whith excerpt or even full content of the post.
This will have a bad effect in Google Search Rankings.
Just remove this duplicate content and see the your organic traffic increasing.
1. With www or without www.
You should chose your favorite canonical Url. (with www or without www)
Google indexes both www.cucirca.com and cucirca.com giving them 2 different pageranks.
The best way is to modify your .htaccess file to make a server redirect from the non www to the www version.
To do this add the following lines to your .htaccess file:
RewriteEngine On
RewriteCond %{HTTP_HOST} ^cucirca\.com [NC]
RewriteRule (.*) http://www.cucirca.com/$1 [L,R=301]
Replace cucirca.com with your own domain.
2. With / or without /
I took a close look at google analytics. At the top content tab I see that the best 2 referral urls are almost the same:
http://www.cucirca.com/2007/02/21/13-places-to-watch-tv-online-for-free/
http://www.cucirca.com/2007/02/21/13-places-to-watch-tv-online-for-free
So this is again duplicate content
What to do?
Use .htaccess file to make all urls end with a “/”.
Add the following code to your .htacceess file, just under the www rule:
RewriteBase /
RewriteCond %{REQUEST_FILENAME} !-f
RewriteCond %{REQUEST_URI} !(.*)/$
RewriteRule ^(.*)$ http://www.cucirca.com/$1/ [L,R=301]
3. Use robots.txt to remove duplicate content
How to Get More Natural Search Traffic With robots.txt
4. Noindex Follow
Create the following Php Code and add it to your header.php before </head> tag.
<?php if ( $paged > 1 ) {
echo ‘<meta name=”robots” content=”noindex,follow” /> ‘;
}?>
<?php if (is_author() ) {
echo ‘<meta name=”robots” content=”noindex,follow” /> ‘;
}?>
<?php if (is_trackback() ) {
echo ‘<meta name=”robots” content=”noindex,follow” /> ‘;
}?>
<?php if (is_search() ) {
echo ‘<meta name=”robots” content=”noindex,follow” /> ‘;
}?>
<?php if (is_date() ) {
echo ‘<meta name=”robots” content=”noindex,follow” /> ‘;
}?>
It is wise to let the spiders index your categories because there are excerpts of the posts and they’re not considered duplicate.
This is my side of the story. I’ve tested all the above tips and the results are starting to show.
If you have any suggestions I’ll be glad to discuss them.
Resources for this post: Google Webmaster Central
A little test for me too see how duplicate works(I will share the results in a future post):
If the video doesn't work in Internet Explorer, download and install Firefox with Google Toolbar for free.
Enjoyed this site? Then do not hesitate to Buy me a Coffee
Published by March 28th, 2007 in Seo.If you enjoyed this post Subscribe to the Free Newsletter
Additional Reading:How to add Sitemap Autodiscovery in Robots.txt
How to get out of Google Supplemental Results
Links for 2007-04-22
Neimple’s First Theme: Pearl
Why you should upgrade to Wordpress 2.1.2
I’ve Upgraded to Wordpress 2.2
How to Get More Natural Search Traffic With robots.txt
I’ve Upgraded to Wordpress 2.0.6
Wordpress 2.1 Hidden Editor Buttons
5 Reasons Why You Should Translate Your Blog
Discuss more in FORUM

across domains that either completely match other content or are appreciably similar. Most of the time when we see this, it’s unintentional or at least not malicious in origin: forums that generate both regular and stripped-down mobile-targeted pages, store items shown (and — worse yet — linked) via multiple distinct URLs, and so on. In some cases, content is duplicated across domains in an attempt to manipulate search engine rankings or garner more traffic via popular or long-tail queries.




March 31st, 2007 at 11:22 am
Thanks for the informative post.
Gili
April 13th, 2007 at 6:58 am
[…] the changes I’ve made to the robots.txt and removing some of the duplicate content from I’m starting to have more and more traffic from the […]
December 20th, 2007 at 10:21 am
I would like to see a continuation of the topic
January 30th, 2008 at 12:48 am
[…] Here are 4 Steps to remove Wordpress duplication. Get On The First Page of Google With Ultimate Squidoo Related Posts: Adsense Manager Plug-in for Wordpress […]
February 8th, 2008 at 11:31 pm
[…] This post shares four important tips for removing the duplicate links in wordpress blogs. SEO Friendly attempt Links, Tech tips […]
April 1st, 2008 at 12:14 pm
Hi : )
Thank Mr Cucirca!
Nice post, I enjoyed it.
For categories also there is a plugin called ‘category base killer’ which removes the /category/ of the directories url. I think it works better (though I haven’t used it yet) and excerpts is not required any more.
Thanks again for this post