Learning SEO : Duplicate Content
Redundancy Issues: Duplicate Content
According to a 2008 post by Google, duplicate content is any piece of content that’s substantially similar to something posted elsewhere on the Internet. Posting duplicate content drives your search engine ranking down, both because of algorithms designed to penalize content scraping and confusion on the part of the search engine as to which piece of content is the more relevant one. It’s therefore important that you do what you can to keep your content both unique and original.
What Does Duplicate Content Look Like?
In order to properly deal with duplicate content, one first needs to know how to recognize it. As a webmaster, you’ll encounter two primary ‘breeds’ of this type of content - offsite and onsite. Offsite duplicate content includes:
- Scraped Content: Content lifted wholesale from your website. Content scrapers steal from all across the web, publishing content on their own site and either claiming or misattributing ownership. Generally, these pages are incredibly ad-heavy, and offer very little of value to the reader.
- Plagiarized Content: Not all stolen content is scraped content. Some content creators - whether deliberately or by accident - may occasionally take something written or designed by someone else and pass it off as their own. This could be anything from a simple quote to an entire article.
- Over-Distributed Articles Or Pages: There exists a certain strategic element to posting a single article on multiple websites - but you need to be extremely careful when doing so. Distribute your articles to too many different websites, and you may end up suffering for it.
- Generic Content: One of the most important rules of content creation is to be unique. Write cookie-cutter blog posts or generic product descriptions, and there’s a good chance that what you write will get lost in the noise of thousands of other content creators writing the same thing.
Onsite duplicate content tends to be slightly more difficult to spot. It includes:
- Duplicate Session IDs: Designed to track and index traffic on a website, a session ID assigns a unique URL ‘code’ to each visitor, which sticks with them as they visit each page on the site. Particularly with larger websites, this creates an immense volume of duplicate content, confusing search engines and likely dragging your search engine ranking down.
- Duplicate URLs: There are a number of different URL parameters that can be applied to a website, all of which run the risk of producing duplicate content. These include click tracking, analytics code, and secure/non-secure URLs.
- Outdated Pages and Files: Whenever you update or rework your site, there’s the risk that old links and files may still be indexed by search engines.
- Duplicate Summaries: Product descriptions and summaries are among the most common form of duplicate content on digital storefront. Though time-consuming, it’s advised to write up unique sales copy for each product you offer - again, avoid being generic.
- Duplicate Categories: Digital storefronts and blogs which assign categories to each of their post risk running into this issue. Basically, each time you assign a category to a page, it creates a unique URL for that page. On websites with a large number of pages, this can lead to a positively nightmarish volume of duplicate content.
- Printer-Friendly Pages: If your website is configured to allow the user to view printer-friendly versions of its pages, those versions could end up being crawled, serving as duplicate content.
- Duplicate Meta Descriptions/Title Tags: Self-explanatory. It’s incredibly important that you do what you can to keep your meta descriptions and title tags unique and distinct from one another.
How To Deal With Duplicate Content
When tackling duplicate content, how you approach the problem will depend entirely on which of the two breeds you’re working with. Thankfully, it’s not terribly difficult in either case.
Offsite Duplicate Content
The best way to deal with duplicate content that doesn’t appear on your website is to be proactive. If you’re sharing a guest post with an array of different sites, make sure you limit how many you’re posting to - and which ones, as well. If you’re sharing content, ensure the people who post it link back to the original work.
With content scrapers, you’re usually safe to just keep on as you ordinarily would. In most cases, content scraping sites have little to no relevance to search engines - meaning they’re no threat to your site. Even if you don’t take action, there’s a good chance they’ll be identified for what they are and taken down eventually, anyway. The exception here, of course, is when a scraped version starts getting close to your site in rank - or outranks it.
In that case, you’ll want to use Google’s Scraper Report Tool. The rest should sort itself out.
Plagiarists are a bit trickier, since their sites usually aren’t full-on scraping hubs. That means there’s a much better chance that they’ll outrank you, in addition to being seen as authorities where content is concerned. Contact the webmaster or author if you think it’ll help - there’s a chance it was just an honest mistake.
Otherwise, all you can do is file a DMCA complaint and wait.
Onsite Duplicate Content
You’ve a great deal more power when dealing with onsite duplicate content. A fortunate thing, since it’s so much more diverse and complex than its twin. Here are a few ways you can optimize your site to be rid of duplicates:
- Use 301 redirects. This will ensure that, even if a particular page has a wide array of different URLs, visitors are ultimately all directed to the primary one. Unfortunately, they can be a bit time-consuming to implement, particularly on larger sites.
- Use the rel=canonical tag, which tells search engines which url in a range is the ‘canonical’ one; the one that they should index. Just be sure you’re using it correctly.
- Add the noindex, follow meta robots tag to any pages you don’t want search engines to index. This will allow the robots to still crawl a page at a particular URL without creating duplicate content. You could also consider using webmaster tools to remove the content.
- Keep your internal links consistent.
- Write unique content for each and every product summary. Yes, it’s time consuming, but it’s also necessary.
- Be proactive in curating old content - delete outdated files and get rid of broken links.
- Google Webmaster tools has a number of functions designed to help you track down and eliminate duplicate content. Make full use of them.