SEOmoz Daily SEO Blog |
Restricting Robot Access for Improved SEO Posted: 15 Mar 2011 02:03 PM PDT Posted by Lindsay Left to their own devices, search engine spiders will often perceive important pages as junk, index content that shouldn’t serve as a user entry point, generate duplicate content, along with a slew of other issues. Are you doing everything you can to guide bots through your website and make the most of each visit from search engine spiders? It is a little like child-proofing a home. We use child safety gates to block access to certain rooms, add inserts to electrical outlets to ensure nobody gets electrocuted, and place dangerous items out of reach. At the same time we provide educational, entertaining, and safe items within easy access. You wouldn't open the front door of your unprepared home to a toddler, then pop out for a coffee and hope for the best. Think of Googlebot as a toddler (If you need a more believable visual, try a really rich and very well-connected toddler). Left to roam the hazards unguided you'll likely have a mess and some missed potential on your hands. Remove the choice to access the troublesome areas of your website and they’re more likely to focus on the good quality options at hand instead. Restricting access to junk and hazards while making quality choices easily accessible is an important and often overlooked component of SEO. Luckily, there are a number of tools that allow us to make the most of bot activity and keep them out of trouble on our websites. Lets look at the four main robot restriction methods; the Meta Robots Tag, Robots.txt files, the X-Robots Tag, and the Canonical Tag. We’ll summarize quickly how each method is implemented, cover the pros and cons of each, and provide examples of how each one can be best used. CANONICAL TAG The canonical tag is a page level meta tag that is placed in the HTML header of a web page. It tells the search engines which URL is the canonical version of the page being displayed. Its purpose is to keep duplicate content out of the search engine index while consolidating your pages strength into one ‘canonical’ page. The code looks like this:
There is a good example of this tag in action over at MyWedding. They used this tag to take care of tracking parameters important to the marketing team. Try this url - http://www.mywedding.com/?utm_source=whatever-they-want-to-track. Right click on the page, then view the source. You'll see the rel="canonical" entry on the page. Pros
Cons
Example Uses
ROBOTS.TXT Robots.txt allows for some control of search engine robot access to a site; however it does not guarantee a page won’t be indexed. It should be employed only when necessary. I generally recommend using the Meta tag “noindex” for keeping pages out of the index instead. Pros
Cons
Example Uses
META ROBOTS TAG The Meta robots tag creates page-level instructions for search engine bots. The Meta robots tag should be included in the head section of the HTML document. Here is some info on how the tag should look in your code. The Meta Robots Tag is my very favorite option. By using the 'noindex' tag, you keep content out of the index but the search engine spiders will still follow the links and pass the link love. Pros
Cons
Example Uses
X-ROBOTS-TAG Since 2007 Google and other search engines have supported the X-Robots-Tag as a way to inform the bots about crawling and indexing preferences in the HTTP Header used to serve the file. The X-Robots-Tag is very useful for controlling indexation of non-HTML media types such as PDF documents. Pros
Cons
Example Uses
Lets Turn this Ship Back Around What was all the baby talk you started out with, Lindsay? Oh, that's right. Thanks. In your quest to bot-proof your website, you have a number of tools at your disposal. These differ greatly from those used for baby-proofing but the end result is the same. Everybody (babies and bots) stays safe, on track, out of trouble, and focused on the most important stuff that is going to make a difference. Instead of baby gates and electric socket protectors you've got the Meta Robots Tag, Robots.txt files, the X-Robots Tag, and the Canonical Tag. In my personal order of preference, I'd go with...
Your Turn! I would love, love, love to hear how you use each of the above robot control protocols for effective SEO. Please share your uses and experience in the comments and let the conversation flow. Happy Optimizing! Stock Photography by Photoxpress |
You are subscribed to email updates from SEOmoz Daily SEO Blog To stop receiving these emails, you may unsubscribe now. | Email delivery powered by Google |
Google Inc., 20 West Kinzie, Chicago IL USA 60610 |