Canonical URL Issues: Avoid Confusing Google

There’s an interesting tale of caution here which shows just how lost you can get if you start dabbling with clever SEO features and URL tweaks but take a wrong turn somewhere. Originally spotted by Malcolm Coles, we thought it interesting enough to share with you here too in more detail.

In short (and from his blog):

You use the rel=canonical command to tell Google that a given URL is actually a version of another URL – and that the search engine should treat the second version as if it was that main URL.

The Express site’s CMS is creating a duplicate version of every single page via the rel=canonical tag. And then a 3rd version, and then a 4th … and its never stopping until it gets to infinity.

Give it a read and then let’s take a closer look at the context.

It makes it all the worse that the canonical “infinite” loop issue was something Matt Cutts predicted would happen years ago. 301 and 302 HTTP errors have the same threat but we must admit it has been a while since we saw such an extreme example of this! It just goes to show that bad development and implementation of canonical elements can cause more problems than they solve.

The key to avoiding these infinite loop problems (again, another point mentioned by Google) is to make sure that the CMS is built properly in the first place. If URL masking is set up correctly in conjunction with 301s then the use of canonical tags are unnecessary.

A better option

So, to give this post a bit more of a constructive element, here are a few pointers to bear in mind for anyone toying with these functions.

Firstly, it’s worth remembering the option of using “noindex” and “nofollow” on duplicate pages should make sure they aren’t indexed in the first place and avoid the need to use canonicalisation altogether.

This can be done globally via a robots.txt file – using simple rules to automatically block or “noindex” any spiders attempting to index pages outside of the original page.

For example

(using as the primary page)

Adding a string referencing* to the robots.txt file could let you block any other variations of this page (assuming the variations occurred after the last “/”)

In this case, would NOT be indexed by Google, and so you could eliminate any problems canonical tag of this particular page relating it back to

Google strongly recommends that the use of canonical tags be the last resort – only use it if you have absolutely no other option. Also, be aware that Google only takes these canonical tags as a “guide” and does not guarantee that it will parse them as you assume.

Again, this comes down to building the site properly in the first place and making sure you can always avoid canonical tags (which are really seen as a “repair” tag) as instead of the standard HTTP 301 redirect.

In short, if it ain’t broke, don’t use canonical tags.

dotdigital is now ISO 27701 certified

We’re extremely proud to announce that dotdigital has been awarded ISO 27701 certification; demonstrating our ongoing commitment to privacy and trust. So, what is ISO 27701? The International Organization for Standardization (ISO) is an independent organization that…

What is responsible marketing?

Responsible marketing is increasingly vital for modern brands looking to retain customers, increase lifetime value, and create unforgettable brand recognition. Your brand, products, and services have an impact on your users. It’s your job to ensure that it’s a positive…

Progressive Web Apps: future-proofing your ecommerce business

The future of ecommerce  PWA is the future of ecommerce, and if you didn’t already know that, we’ll tell you why. When looking for the best customer experience on your ecommerce site, PWA is the answer. Especially…