You may have been heard lately more and more about canonical pages, 301 redirects, duplicate content and… Google. Well, I’m going to write a few lines about what I know so far about canonical pages, usage, and more.
As you know Google was always concerned about duplicate pages. You should be also concerned since poor evaluation of your website may lead sooner or later to unwanted results, including Google penalization of some sort.
What is a canonical page? Why specify a canonical page?
A canonical page is the preferred version of a set of pages with highly similar content. To follow our example, you should choose ONE canonical page from those five, and let’s say we prefer “www.example.com”. To show Google which is the canonical page (meaning which page to index and consider showing in it’s search results) we have to add a canonical link tag in the header of our website:
NOTE: To avoid any mistakes, place the canonical link tag, just before the closing head tag (just as I did in this example).
Of course, canonical page tag will solve a lot of issues like but these are the most important in my opinion:
- Bad linking to a specific page, which can show the same content as the original page.
- Slash issues.
- Case sensitive in URLs title issues.
What is Google Saying about canonical pages?
If Google knows that these pages have the same content, we may index only one version for our search results. Our algorithms select the page we think best answers the user’s query. Now, however, users can specify a canonical page to search engines by adding a <link> element with the attribute rel=”canonical” to the <head> section of the non-canonical version of the page. Adding this link and attribute lets site owners identify sets of identical content and suggest to Google: “Of all these pages with identical content, this page is the most useful. Please prioritize it in search results.”
Prior to apply the canonical page tag, I strongly recommend to take a look to your .htaccess file. Open a browser and check the following:
Case 1: If you type example.com is showing the same as www.example.com? If the answer is yes, then open your /public_hml/.htaccess file and add these lines:
Options +FollowSymLinks
RewriteEngine on
RewriteBase /
RewriteCond %{HTTP_HOST} !^www.example.com$
RewriteRule (.*) http://www.example.com/$1 [R=301,L]
NOTE: This will 301 permanent redirect all non-www pages to their www version. This is one of the basic rule to avoid duplicate content.
Case 2: www.example.com/index.php (or index.html) show the same content as www.example.com/? If yes, then add these two extra lines under the ones we added earlier:
RewriteCond %{THE_REQUEST} ^[A-Z]{3,9} /.*index.php HTTP/
RewriteRule ^(.*)index.php$ http://www.example.com/$1 [R=301,L]
Other recommended things you can do before canonical page tag application:
- Check the consistency of your internal links. Make sure that if you refer to a page multiple times, you use the same URL.
- Do tests, imagine what a user could do?
- Try not to link from a page to the same page. Ex: From homepage to homepage. It is fine to link to homepage, but not from the homepage.
- Do a test sitemap. Check out the URLs, should be the same as the canonical version of the same pages.
After to set the canonical link tag, do more tests. It is Grey the C sign? That means that the current location matches the specified canonical page.
If it’s blue, means that you are o a different version of the page that is defined as being canonical, and clicking on the blue bubble will redirect you to the defined canonical page.
Here Matt Cutts, is speaking about canonical pages. The video was registered on February 12, 2010.
Extra info about canonical link tag
- You can not use it for cross-domain linking.
- It is fine to use it for subdomains.
- All search engines recommends the usage of absolute URLs instead of relative ones.
The three major search engines answered a few general question:
Quotes from Google:
Is rel=”canonical” a hint or a directive?
It’s a hint that we honor strongly. We’ll take your preference into account, in conjunction with other signals, when calculating the most relevant page to display in search results.
Can I use a relative path to specify the canonical, such as <link rel=”canonical” href=”product.php?item=swedish-fish” />?
Yes, relative paths are recognized as expected with the <link> tag. Also, if you include a <base> link in your document, relative paths will resolve according to the base URL.
Is it okay if the canonical is not an exact duplicate of the content?
We allow slight differences, e.g., in the sort order of a table of products. We also recognize that we may crawl the canonical and the duplicate pages at different points in time, so we may occasionally see different versions of your content. All of that is okay with us.
What if the rel=”canonical” returns a 404?
We’ll continue to index your content and use a heuristic to find a canonical, but we recommend that you specify existent URLs as canonicals.
What if the rel=”canonical” hasn’t yet been indexed?
Like all public content on the web, we strive to discover and crawl a designated canonical URL quickly. As soon as we index it, we’ll immediately reconsider the rel=”canonical” hint.
Can rel=”canonical” be a redirect?
Yes, you can specify a URL that redirects as a canonical URL. Google will then process the redirect as usual and try to index it.
What if I have contradictory rel=”canonical” designations?
Our algorithm is lenient: We can follow canonical chains, but we strongly recommend that you update links to point to a single canonical page to ensure optimal canonicalization results.
then Yahoo:
- The URL paths in the <link> tag can be absolute or relative, though we recommend using absolute paths to avoid any chance of errors.
- A <link> tag can only point to a canonical URL form within the same domain and not across domains. For example, a tag on http://test.example.com can point to a URL on http://www.example.com but not on http://yahoo.com or any other domain.
- The <link> tag will be treated similarly to a 301 redirect, in terms of transferring link references and other effects to the canonical form of the page.
- We will use the tag information as provided, but we’ll also use algorithmic mechanisms to avoid situations where we think the tag was not used as intended. For example, if the canonical form is non-existent, returns an error or a 404, or if the content on the source and target was substantially distinct and unique, the canonical link may be considered erroneous and deferred.
- The tag is transitive. That is, if URL A marks B as canonical, and B marks C as canonical, we’ll treat C as canonical for both A and B, though we will break infinite chains and other issues.
finally Bing:
- This tag will be interpreted as a hint by Live Search, not as a command. We’ll evaluate this in the context of all the other information we know about the website and try and make the best determination of the canonical URL. This will help us handle any potential implementation errors or abuse of this tag.
- You can use relative or absolute URLs in the “href” attribute of the link tag.
- The page and the URL in the “href” attribute must be on the same domain. For example, if the page is found on “http://mysite.com/default.aspx’, and the ‘href” attribute in the link tag points to “http://mysite2.com’, the tag will be invalid and ignored.
Finally affiliate links, print previews, session ID request pages will all be better handles as before with the new canonical URL tag.
There is also a Canonical URL wordpress plugin for wordpress addicted people. :)
Much usability of this link tag will be applicable by VBulletin (as well as many other CMS developers) should be taking note of as they can use this in updates of their forum software which (VBulletin specifically) creates a few copies of their forum posts with different URL’s.
A good example of this is VBulletin’s “showthread’, “printthread” and the forum archive all produce very similar pages with the same content and I believe a visitor (and SE bots alike) would much prefer being directed to the active showthread version.
Questions and Answers
Q: The canonical URL tag work across different domain names?
A: No, you can use it only for one domain name, otherwise will return as “invalid” for SE.
Q: subdomain.example.com could suggest www.example.com as a canonical url?
A: Yes.
Q: Can I use this to suggest http://example.com be the canonical url instead of https://example.com?
A: Yes, sure.
Q: This or a 301 permanent redirect?
A: Both are valid but canonical link tag has much more usability in practice.
Q: Do the pages have to be bit-for-bit identical?
A: Nope but link juice redirects, attempts to redirect link credit from let’s say 10 pages to the main page, won’t be tolerated. Take care here. :)
Q: Absolute or relative URL?
A: Google recommends absolute URLs. Also take care, Apache will not be that happy.
Q: Can Google follow a chain of canonicals? (transitive)
A: Theoretically yes, but recommended is to point to use the tag for the final URL.