Avoiding Duplicate Content with Canonical Links.

Buzzword Bingo: Duplicate Content

(Photo credit: planeta)

Duplicate content is a problem for search engines, and that makes it a problem for SEOs. Google  strongly dislikes including the same content more than once in search results, and if content exists in more than one place on a site, search engine algorithms have trouble determining which version they should include or exclude. They also have difficulty knowing where to assign link juice and authority. This confusion can lead to sites experiencing a loss of traffic and reduced SERP rankings. The “rel=canonical” element is intended to help search engines out by letting them know which page is “the page” when it comes to particular content.

Duplicate content from the perspective of search engine crawlers can be created in a couple of different ways. Sites can actually have the same content on two different pages of their site or they can have links that differ yet point to the same content — these both look the same to the crawlers.

First Minimize Duplicate Content

The obvious solution to clarifying things for search engines is to just remove all the duplicate content or not put duplicates up in the first place. In fact, site owners should approach the design of a site with the intention of reducing duplicate content to an absolute minimum. There are a few steps developers can take to reduce the likelihood of duplicate content.

Use a CMS like WordPress

WordPress is fairly good at creating a link structure that avoids having multiple slightly different links heading to the same content. For a large site it can be very difficult to manually track URLs to ensure that content is only addressed by one URL. A CMS will handle this for you.

Keep Internal Links Consistent

Link to the same content in the same way as much as possible.

Use 301 Redirects

If duplicate content is unavoidable — we’ll discuss why this might be below — then the best way to deal with it is to use a permanent 301 redirect of the duplicate to the original.

So Why Do We Need ‘rel=canonical’?

Sometimes, duplicate content is unavoidable and 301 redirects are not possible. Occasions where duplicate content may arise unavoidably are:

  • URLs with session IDs, Analytics tracking codes and other URL parameters will result in divergent links pointing to the same content.
  • Landing pages for specific incoming links.
  • Pages which are designed to present the same information organized in different ways: product listings sorted alphabetically and  by price, for example.

In these cases, the “rel=canonical” element comes into play. For example, suppose you run a florist’s website. Florists sell different bouquets of roses: a single rose, a dozen roses, two dozen roses, etc. Apart from the number of roses and the price, the pages are going to be largely identical. Let’s say that the classic single rose will be the canonical page at

www.thornyproblem.com/rose

And the others will be at

www.thornyproblem.com/rose?quantity=12
www.thornyproblem.com/rose?quantity=24

In the <head> section of each of the non-canonical pages, we add a link to the canonical page:

<link rel=”canonical” href=”www.thornyproblem.com/rose”/>

And now Google knows which of the pages to consider authoritative. Google is not obligated to pay attention to canonical links, but it does use it as a very strong signal when determining which page to give priority to. Just in case you are tempted to be naughty with “rel=canonical”, keep in mind that Google don’t acknowledge cross-domain canonicalization.