Which Page is Canonical? |
Posted: 27 Jun 2012 08:03 PM PDT Posted by Dr. Pete It sounds like an easy question, doesn’t it? While we hear a lot about duplicate content since the Panda update(s), I’m amazed at how many people are still confused by a much more fundamental question – which URL for any given page is the canonical URL? While the idea of a canonical URL is simple enough, finding it for a large, data-driven site isn’t always so easy. This post will guide you through the process with some common cases that I see every week. Let’s Play Count the PagesBefore we dive in, let’s cover the biggest misunderstanding that people have about “pages” on their websites. When we think of a page, we often think of a physical file containing code (whether it’s static HTML or script, like a PHP file). To a crawler, a page is any unique URL that it finds. One file could theoretically generate thousands of unique URLs, and every one of those is potentially a “page” in Google’s eyes. It’s easy to smile and nod and all agree that we understand, but let’s put it to the test. In each of the following scenarios, how many pages does Google see? (A) “Static” Site
(B) PHP-based Site
(C) Single-template Site
The answer is (A) 4, (B) 4, and (C) 4. In Google’s eyes, it doesn’t matter whether the pages have extensions (“.php”), the home-page is at the root (“/”) or at index.php, or even if every page is being driven off of one physical template. There are four unique URLs, and that means there are four pages. If Google can crawl them all, they’ll all be indexed (usually). Let’s dive right into a few examples. Please note: these are just examples. I’m not recommending any of the URL structures in this post as ideal – I’m just trying to help you determine the correct canonical URL for any given situation. Case 1: Tracking URLsI’ll start with an easy one. Many sites still use URL parameters to track visitor sessions or links from affiliates. No matter what the parameter is called or which purpose it’s used for, it creates a duplicate for every individual visitor or affiliate. Here are a few examples:
In the first two examples, the session and affiliate ID create a copy, in essence, of the main store page. In both of these cases, the proper canonical URL is simply:
The last example is a bit trickier. There, we also have a “product=” parameter that drives the product being displayed. This parameter is essential – it determines the actual content of the page. So, only the “affiliate=” parameter should be stripped out, and the canonical URL is:
This is just one of many cases where the canonical URL is NOT the root template or the URL with no parameters. Canonical URLs aren’t always short or pretty – many canonical URLs will have parameters. Again, I’m not arguing that this structure is ideal. I’m just saying that the canonical URL in this case would have to include the “product=” parameter. Case 2: “Dynamic” URLsUnfortunately, the word “dynamic” gets thrown around a little too freely – for the purposes of this blog post, I mean any URLs that pass variables to generate unique content. Those variables could look like traditional URL parameters or be embedded as “folders”. A good example of the kind of URLs I’m talking about are blog post URLs. Take these four:
Again, it doesn’t matter whether the URLS have parameters or hide those parameters as virtual folders. All of these URLs use a unique value (either an ID or date) to generate a specific blog post. So what’s the canonical URL here? Obviously, if you canonicalize to “/blog”, you’re going to reduce your entire blog to one page. It’s a bit of a trick question, because the canonical URL could actually be something like this:
This is why we have such a hard time detecting the proper canonical URLs with automated tools – it really takes a deep knowledge of a site’s architecture and the builder’s intent. Don’t make assumptions based on the URL structure. You have to understand your architecture and crawl paths. If you just start stripping off URL parameters, you could cause an SEO disaster. Case 3: The Home-pageIt might seem strange to put the home page third, but the truth is that the first two cases were probably easier. Part of the problem is that home pages naturally spin out a lot of variations:
Add in complications like secure pages (https:), and you can end up multiplying all of these variants. While this is technically true of any page, the problem tends to be more common for the home page, since it’s usually the most linked-to page (both internally and from external sites) by a large margin. In most cases, the technically correct home-page URL is:
…but there are exceptions (such as if you secure your entire site). I don’t see the trailing slash (“/”) causing a ton of problems on home pages these days, since most browsers and crawlers add it automatically, but I think it’s still a best practice to use it. Another common exception is if your site automatically redirects to another version of the home-page – ASP is notorious about this, and often lands visitors and bots at “index.aspx” or a similar page. While that situation isn’t ideal, you don’t want to cross signals. If the redirect is necessary, then the target of that redirect (i.e. the “index.aspx” URL) should be your canonical URL. Finally, be very careful about situation #5 – in that case, as I discussed in the first section of this post, the “index.php” code template is actually driving other pages with unique content. Canonicalizing that to the root or to “index.php” could collapse your site to one page in the Google index. That particular scenario is rare these days, but some CMS systems still use it. Case 4: Product PagesIn some ways, product pages are a lot like the blog-post pages in Case #2, except moreso. You can naturally end up with a lot of variations on an e-commerce site, including:
If you have a URL like #3, then that’s going to be your canonical URL for the product in most cases (especially #1-#3). If you don’t, then work up the list. In other words, if you have #3, use it; if not, use #2; if not #2, use #1. You have to work with the structure you have. URLs #4-#6 are a bit trickier. Something like the currency selector in #4 can be very complicated and depends on how those selections are implemented (user selection vs. IP-based geo-location, for example). For Google’s purposes, you would typically want them to use the dominant price for the site’s audience and canonical to the main product URL (#1-#3, depending on the site architecture). Indexing every price variant, unless you have multiple domains, is just going to make your content look thinner. With #5 and #6, the URL indicates a product variant, let’s say a T-shirt that comes in different colors and sizes. This situation depends a lot on the structure and scope of the content. Technically, your T-shirt in red/large is unique, and yet that page could look “thin” in Google’s eyes. If you have a variant or two for a handful of products, it’s no big deal. If every product has 50 possible combinations, then I think you need to seriously consider canonicalization. Case 5: Search PagesNow, the ugliest case of them all – internal search pages. This is a double-edged sword, since Google isn’t a fan of search-within-search (their results landing on your results) in general and these pages tend to spin out of control. Here are some examples:
The list, unfortunately, could go on and on. While it’s natural to think that the canonical version should be #1-#3 (depending on your URL structure, just like in Case #4), the trouble is pagination. Pages 2 and beyond of your topic search may appear thin, in some cases, but they return unique results and aren’t technically duplicates. Google’s solutions have changed over time, and their advice can be frustrating, but they currently say to use the rel=prev/next tags. Put simply, these tags tell Google that the pages are part of a series. In cases like #5-#6, Google recommends you use rel=prev/next for the pagination but then a canonical tag for the “&page=2” version (to collapse the sorts and filters). Implementing this properly is very complicated and well beyond the scope of this post, but the main point is that you should not canonicalize all of your search pages to page 1. Adam Audette has an excellent post on pagination that demonstrates just how tricky this topic is. Know Your Crawl PathsFinally, an important reminder – the most important canonical signal is usually your internal links. If you use the canonical tag to point to one version of a URL, but then every internal link uses a different version, you’re sending a mixed signal and using the tag as a band-aid. The canonical URL should actually be canonical in practice – use it consistently. If you’re an outside SEO coming into a new site, make sure you understand the crawl paths first, before you go and add a bunch of tags. Don’t create a mess on top of a mess. Sign up for The Moz Top 10, a semimonthly mailer updating you on the top ten hottest pieces of SEO news, tips, and rad links uncovered by the Moz team. Think of it as your exclusive digest of stuff you don't have time to hunt down but want to read! | |||||
How to Write Email to Get a Better Response Rate Posted: 27 Jun 2012 03:25 AM PDT Posted by moosahemani This post was originally in YouMoz, and was promoted to the main blog because it provides great value and interest to our community. The author's views are entirely his or her own and may not reflect the views of SEOmoz, Inc. A successful SEO campaign is the perfect combination of all strategies. Whether you're working with on-page optimization, content development, social media, or link building, all of these factors contain equal value. When it comes to picking the most difficult SEO strategy, I will always give my vote to link building as this part is one of the most difficult, boring ,and time consuming strategies you can implement. A few days back, I shared a picture of the perfect bedroom for a link builder on Facebook: Almost perfect, despite the coffee pot missing ;) Yes, link building can be boring, tough, and time consuming. However, one person on the team can dive into link building and get their hands dirty to get the job done in order to produce effective and action-driven results for the business. Many people use different tactics when it comes to link building. One of the famous and most effective techniques that almost every ethical SEO uses is to manually outreach to other webmasters and ask for a link. Although the rate of response can be low, implementing a few smart email writing tactics can actually increase the response rate. In this post, I will discuss a few tactics that I have used in recent campaigns where I had to write good amount of manual email to a variety of influencers and bloggers to ask them for a link favor. I tried out a few different ideas and finally created a format that allowed me to write every email as personalized as possible, while saving a lot of my time. Here I go! Use catchy subject titlesThe first section of an email everyone reads is the title. It is important to have a catchy title or else your email will soon be sent directly to the trash bin. Do not try to manipulate the reader by creating false title, but instead create a title that is interesting and captivating to act as a perfect lead-in for the valuable content of the email. Some good examples of titles are:
Length of the emailThis is an extremely important factor. Do not write a one-line email that clarifies nothing. You want to make sure your email's content delivers the intended actions and requests in a concise, yet inclusive, manner. Similarly, do not stuff the email with tons of unnecessary information. In either case, the recipient is likely to delete your message right away without even reading it (yeah, I can see you having a déjà-vu here). A perfect email should have, more or less, two paragraphs that describe the solid reason for writing that email. Use namesNot rocket science, but always a good reminder! Use the intended recipient's name while asking them for a favor, or do not expect them to reply back. The people you are writing to are busy just like you, and their to-do lists are already filled with tasks to accomplish. You better make your request sounds important, and that starts with using their name. How many times have you ignored emails addressed to ‘Hello Webmaster,’ or similar? Plenty. Take a little step forward, do your research, search for their names, and use them! After all, it is all in the name! Example:
The first paragraphIf you are writing an email of 100 words or more, it is important that your first paragraph should be appealing, smart, and engaging enough to encourage your reader to happily continue their journey through the end of the message. I've tried different formats and ideas for emails, but what stuck best with my campaign was to dedicate the entire first paragraph to the receiver. This may sound like a lot of work, but checking the social profiles and doing some background on your recipient can tell you an enormous amount about a person. Ultimately, this will let you to talk to him or her more comfortably. Example:
The second paragraphDon’t drag, just say it! Now that you've hit the second paragraph, you've made it to the ground floor of your email. If you are going to drag your point out a little longer, then you will probably lose the interest of the recipient. Try to be direct in the second paragraph and let the reader know what you want from him or her. Try to explain your objective in few lines and move towards the end of your note. Example:
The ending noteNow that you have done your job in describing your objective behind the email, it is time to sum it up nicely in a courteous way.
Why I prefer this formatI've been working on improving my emails for quite some time now, and this pitch and format has worked for me almost every time. Here are a few reasons why I think this email format is sure to get you a better rate of response:
Obviously the rate of response is not likely to be 100 percent, but I have found that using this format increases the rate of response for different niches. If you have any other formatting ideas or suggestions, I would love to hear them! Please share your views in the comment section. About the Author: Moosa Hemani is a SEO strategist and writes about SEO and related stuff on different blogs. He recently started an SEO Blog where he shares his opinions about SEO, search engines, social, and inbound. Sign up for The Moz Top 10, a semimonthly mailer updating you on the top ten hottest pieces of SEO news, tips, and rad links uncovered by the Moz team. Think of it as your exclusive digest of stuff you don't have time to hunt down but want to read! |
You are subscribed to email updates from SEOmoz Daily SEO Blog To stop receiving these emails, you may unsubscribe now. | Email delivery powered by Google |
Google Inc., 20 West Kinzie, Chicago IL USA 60610 |