vineri, 16 decembrie 2011

Advanced On-Page Optimization - Whiteboard Friday

Posted: 15 Dec 2011 12:51 PM PST

In this week's Whiteboard Friday, Rand goes into depth on how you can optimize your on-page content. Presented here are five advanced tactics that get you thinking beyond the basics of traditional page optimization and set you up to start creating content that's both relevant and unique.

How do the phrases on your page relate to one another? Does your page content make your visitors happy? By moving beyond simply making sure those title tags are optimized and scattering a few keywords around, we can produce intelligent content that's not only engaging but gets better rankings too.

As promised, here is a Quora thread that describes Google's use of Machine learning.

Video Transcription

Howdy, SEOmoz fans. Welcome to another edition of Whiteboard Friday. This week, we're talking about advanced on-page optimization. Specifically, I have five tactics for you that go beyond the traditional "I'm going to put my keyword in the title tag. I'm going to put my keyword in the URL", those kinds of things.

First one, starting out is this idea of semantic connectivity. We talked about this in the past. We did some research a couple of years ago, maybe 18 months ago on LDA, which is latent Dirichlet allocation, which, of course, is a form of topic modeling that we suspected Google might be using.

It's a way to imagine the connections between words in a particular language. I'll give you an example. Here is the word "cat", and the word "cat" is probably closely related to the word "feline". If you were a search engine and you saw a document with the word "cat" and the word "feline," you would think that document is more relevant to a query for the word "cat" than a document that has the word "cat" and the word "whiteboard," which maybe that would be very far away.

Here's cat and here's canine. Those are much more distant, but cat is relatively closer to tiger, but it's even a little closer to meow. So, you get this sense of, ah, the search engines have a graph of all the words in a language set, how they're connected to each other, what's relevant to what, phrases not just individual words but the two or three or four word phrases.

This kind of thing can be very helpful if you're looking at a document and you're saying to yourself, "Boy, I talked about cats, but I forgot to mention anything about what they eat or what family they're in or what they're related to. I didn't even use the word 'pets.' Maybe, I should be optimizing for those types of things." Employing those closely connected terms can help to boost the relevancy and help boost your rankings.

Second thing on the list, block level optimization. There is a great YOUmoz post about this that we promoted to the main blog recently talking about precisely this type of thing where search engines will essentially analyze individual portions of a page. They'll look at, oh, here's a sidebar and we've decided that's not really relevant because that's navigational links or here's the top nav. We're not going to analyze that for relevancy as much potentially. We're going to look at the header of the document, where the headline is, those first few sentences. We're going to look at the middle of the document, maybe in paragraph forms, the footer of the document, the end. Are all of those things talking about the topic? Are they all on the subject, or is this something that starts out talking about tigers, but it eventually gets into a discussion on genetically modified foods? If that's the case, maybe it's less relevant to tigers. It's just that the initial headline looked like it was relevant to tigers, and so therefore, we don't want to rank this document for the word, tigers. We might even want to be ranking it for something like genetically modified foods. It just happens to use that catchy title.

So, make sure that your document . . . do this kind of check for all of these sections, making sure that they're pertinent, that they're relevant to the content of the query, that they're serving the visitor's interests and needs. If you have that kind of off topic diatribe, and I'm not saying you can't go off topic in your writing a little bit and explore some storyline themes, particularly if you have a long expository piece or you're writing a narrative blog post. That's great. I'm just saying, for stuff that is hyper targeting a particular keyword, especially for a commercial intent or a navigational intent, this might not be ideal. You might want to make those more focused.

Number three, internal and external links. I'm not talking about the links pointing to the page. I'm talking about the links that actually exist on the page. You remember some folks from Google have actually in the past said that, yeah, we might have some things, some first order or second order effect things in our algorithm that rewards people who link out, meaning link to other websites.

Marshall Simmons from The New York Times was on a Whiteboard Friday a couple of years ago, and Marshall talked about how when The New York Times changed their policy to put more external links on the page off to other websites, they actually saw increases and boosts in rankings from the articles that did that, strongly confirming what Google had said about there being some sort of effect in the algorithm, maybe not directly but indirectly looking at, hey, is this person linking out or are they linking out to good places? If they are, we might want to reward them.

Another optimization tactic that's on the more advanced side is putting good external links referencing relevant, potentially useful content on your pages. Linking out to other people is a wonderful thing too, because it puts you into the ecosystem. What I mean by that is if you link to someone else, other people go and visit that page. They might be talking about it. They might thank you for the reference. Someone might see that on Twitter. They might look in their analytics and see that you've sent visitors over and come check out your page and then link to something you've done. That reciprocation is very, very powerful in the organic web, and it can be useful, not only for this direct relevancy boosting signal, but also from a links perspective, from a traffic perspective.

Number four on the list, the happiness of visitors to a page. I know what you're thinking. It's sort of like, wait a minute, that's not on-page optimization. That's more like conversion rate optimization. Yes, but it matters for rankings because Google is looking so much at usage and user data.

I'm going to ask Kenny, who's filming this video, going to wave, Kenny? That's a great wave. Did you all see that? He looked great. It's amazing. I'll ask Kenny to put in a link to a Quora thread where a Google engineer, somebody who worked at Google, actually talked about how they use machine learning on user and usage data signals in the potential ranking algorithm to help better stuff come up when the rankings may be ordered normally just by their classic on-page link stuff and these types of things.

That means that if I can make visitors happier, if I can boost the value of what they're getting out of the pages, I can potentially rank higher too, not just convert more of them but even improve in rankings.

We were talking about things like: Are these visitors completing actions? Are they spending more time on this site or page on average with a good experience than they are with others? What I mean by this is it's not just, "Oh, my time on site is low. I need to find ways to keep visitors on there a longer time." Maybe, you have something that's answering a very, very short query in a short amount of time, and that's making visitors happy. And, maybe, you have something that's answering that query but after a long period of time, visitors are actually unhappy and they're going back to Google and clicking, you know what, block all results from this site, I don't want to see it any more. Or they see you in the rankings in the future, and they're like, "Oh, I remember that domain. I do not want to go through that again. They had those annoying ads and the overlays, and they blocked me from going there."

Every time I see Forbes, I was like, "Man, does this article look interesting enough to me to have to go through that initial screen of the ad, because I know I'm going to get it every time, and it's going to take extra time to load?" On my phone when I'm browsing the Web, I'm always like, "I'm not going to click on that Forbes link. Maybe I'll check it later on my laptop or my desktop."

Those types of things are signals that the engines can look at. Are people coming back? Are they returning again and again? When they see this stuff, true they've got 25% market share with Chrome. They've got the Google tool bar. They have Google free Wi-Fi. They have relationships with ISPs. So, they can get this data right about where everyone goes, not just from search but all over the Web. They know what you're bookmarking. They know what you're returning to. They know your visit patterns. This kind of stuff is definitely going to make its way into the algorithm, I think, even more so than it does today.

Fifth and finally, some content uniqueness and formatting. So, you're all aware of duplicate content issues, thin content issues, and the Panda stuff that happened earlier this year that affected a lot of websites. What you may not know is that there are a bunch of tactics that you can apply in an advanced on-page optimization scenario that can help, so things like completely unique. When I say "completely unique," what I mean is not that you can't quote someone in here, but just that what you can't have is a mad lib style SEO where you've got XY blank Z blank ABC blank, and it's fill in the city name, fill in the proper name, fill in the name of the business, and that's the same across every page on your site, or that's taken from a manufacturer's description and that's put in there.

You need to have that uniqueness throughout, and Google is very good at shingling, which is sort of a method for pattern detection inside topics or inside content. Don't play with them. Just make sure that this is a highly unique piece. If you want to quote something, that's fine. If you want to use media or graphics from somewhere else, that's fine and reference those. I'm not talking about that, but I am talking about that sort of playing mad libs SEO is a dangerous game.

We've noticed that longer content, more content is literally quite well correlated with better rankings, particularly post Panda. What you saw is that sites. I'll give you an example. I look at a lot of rankings for restaurant sites, because I'm constantly doing searches for restaurants and types of food because I travel a ton. What I see is that Yelp and Urban Spoon do very, very well. City Search often does well, and then you'll see those independent, individual blogs. When they tend to rank well, when they're on page one is when they've written that long diatribe exploring all sorts of things on the menu with lots of pictures of the food, an experiential post versus a short snippet of a post. You'll find those on page three, page four, page five. They don't do as well. That longer in- depth content, more of the uniqueness, more value in the content, more than I can get out of it as a reader seems to be something that Google is picking up on. I don't know if that's pure length. I don't know if that's something necessarily they're looking at in the user and usage data, but it could be helpful if you're not ranking very well and you're thinking, boy, I have a lot of pages that are just short snippets. Maybe I'm going to try expanding some of them.

Using images in media, we've, of course, seen the correlation with alt attributes matching the keyword and images. That's not what I'm talking about necessarily, but using images on the page can create more of that in- depth experience and can create a better relationship between you and the visitor. Those things could be picked up and used in other places, and then they'll link back to you. There are all sorts of benefits.

User generated content, so getting comments and interaction down here at the bottom, that type of stuff often is an indication in search engines that, hey, people really care about this. It's also an addition to the amount of content, and it tends to be very unique and valuable and useful. It uses those words that people on the Web would be using about the topic, and that can again be helpful for your content optimization.

Then, finally, Google is clearly looking at things like reading level and correctness of grammar and spelling. There's now a filter inside Google. If you click on the advanced search in the little gear box on the top right- hand corner of your screen when you're logged into Google, you can see advanced search. When you click that, there's a reading level filter to say, "Only show me content that's 12th grade and above." Clearly, Google has that ability.

What I'm saying here is that your content formatting, the way you're putting things together, the length of the document, the in-depthness, and the correctness, these can all have an impact. Don't just be thinking about keyword stuffing and using a few keywords here and there and putting it in the title at the front. Be thinking a little bit more broadly about your on- page optimization. You might get more benefits than even doing some link building, sometimes.

All right, everyone. I hope you've enjoyed this edition of Whiteboard Friday, and we will see you again next week. Take care.

Video transcription by Speechpad.com

Do you like this post? Yes No

How to Fix Crawl Errors in Google Webmaster Tools

Posted: 15 Dec 2011 03:53 AM PST

Posted by Joe Robison

This post was originally in YouMoz, and was promoted to the main blog because it provides great value and interest to our community. The author's views are entirely his or her own and may not reflect the views of SEOmoz, Inc.

Looking at 12,000 crawl errors staring back at you in Webmaster Tools can make your hopes of eradicating those errors seem like an insurmountable task that will never be accomplished. The key is to know which errors are the most crippling to your site, and which ones are simply informational and can be brushed aside so you can deal with the real meaty problems. The reason it’s important to religiously keep an eye on your errors is the impact they have on your users and Google’s crawler.

Having thousands of 404 errors, especially ones for URLs that are being indexed or linked to by other pages pose a potentially poor user experience for your users. If they are landing on multiple 404 pages in one session, their trust for your site decreases and of course leads to frustration and bounces.

Webmaster Tools Crawl Errors

You also don’t want to miss out on the link juice from other sites that are pointing to a dead URL on your site, if you can fix that crawl error and redirect it to a good URL you can capture that link to help your rankings.

Additionally, Google does have a set crawl budget allotted to your site, and if a lot of the robot’s time is spent crawling your error pages, it doesn’t have the time to get to your deeper more valuable pages that are actually working.

Crawl Errors with Arrows

Without further ado, here are the main categories that show up in the crawl errors report of Google Webmaster Tools:

HTTP

This section usually returns pages that have shown errors such as 403 pages, not the biggest problems in Webmaster Tools. For more documentation with a list of all the HTTP status codes, check out Google’s own help pages. Also check out SEO Gadget’s amazing Server Headers 101 infographic on SixRevisions.

In Sitemaps

Errors in sitemaps are often caused by old sitemaps that have since 404’d, or pages listed in the current sitemap that return a 404 error. Make sure that all the links in your sitemap are quality working links that you want Google to crawl.

One frustrating thing that Google does is it will continually crawl old sitemaps that you have since deleted to check that the sitemap and URLs are in fact dead. If you have an old sitemap that you have removed from Webmaster Tools, and you don’t want being crawled, make sure you let that sitemap 404 and that you are not redirecting the sitemap to your current sitemap.

Image of Old Sitemaps and URLs 404'ing

From Google employee Susan Moskwa:

“The best way to stop Googlebot from crawling URLs that it has discovered in the past is to make those URLs (such as your old Sitemaps) 404. After seeing that a URL repeatedly 404s, we stop crawling it. And after we stop crawling a Sitemap, it should drop out of your "All Sitemaps" tab.”

Not Followed

Most of these errors are often caused by redirect errors. Make sure you minimize redirect chains, the redirect timer is set for a short period, and don’t use meta refreshes in the head of your pages.

Matt Cutts has a good Youtube video on redirect chains, start 2:45 in if you want to skip ahead.

Redirect Chains Kills Crawlers

Google crawler exhausted after a redirect chain.

What to watch for after implementing redirects:

When you redirect pages permanently, make sure they return the proper HTTP status code, 301 Moved Permanently.
Make sure you do not have any redirect loops, where the redirects point back to themselves.
Make sure the redirects point to valid pages and not 404 pages, or other error pages such as 503 (server error) or 403 (forbidden).
Make sure your redirects actually point to a page and are not empty.

Tools to use:

Check your redirects with a response header checker tool like URI Valet or the Check Server Headers Tool.
Screaming Frog is an excellent tool to check which pages on your site are showing a 301 redirect, and which ones are showing 404 errors or 500 errors. The free version caps out at 500 pages on the site, beyond this you would need to buy the full version.
The SiteOpSys Search Engine Indexing Checker is an excellent tool where you can put in a list of your URLs that you submitted as redirects. This tool will allow you to check your URLs in bulk to see which ones are indexing and which ones are not. If your original URLs that you had redirected are no longer indexing that means Google removed the old URL from its index after it saw the 301 redirect and you can remove that redirect line from your .htaccess file now.

Examine your site in the text only version by viewing the cached version of the site from the Google SERP listing of the site, then select the text-only version. Make sure you can see all your links and they are not being hidden by Javascript, Flash, cookies, session IDs, DHTML, or frames.

Always use absolute and not relative links, if content scrapers scrape your images or links, they can reference your relative links on their site and if improperly parsed you may see not followed errors show up in your Webmaster Tools, this has happened with one of our sites before and it’s almost impossible to find out where the source link that caused the error is coming from.

Not Found

Not found errors are by and large 404 errors on your site. 404 errors can occur a few ways:

You delete a page on your site and do not 301 redirect it
You change the name of a page on your site and don’t 301 redirect it
You have a typo in an internal link on you site, which links to a page that doesn’t exist
Someone else from another site links to you but has a typo in their link
You migrate a site to a new domain and the subfolders do not match up exactly

Best practice: if you are getting good links to a 404’d page, you should 301 redirect it to the page the link was supposed to go to, or if that page has been removed then to a similar or parent page. You do not have to 301 redirect all 404 pages. This can in fact slow down your site if you have way too many redirects. If you have an old page or a large set of pages that you want completely erased, it is ok to let these 404. It is actually the Google recommended way to let the Googlebot know which pages you do not want anymore.

Redirect 404 Errors

There is an excellent Webmaster Central Blog post on how Google views 404 pages and handles them in webmaster tools. Everyone should read it as it dispels the common “all 404s are bad and should be redirected” myth.

Rand also has a great post on whether 404’s are always bad for SEO also.

Restricted by robots.txt

These errors are more informational, since it shows that some of your URLs are being blocked by your robots.txt file so the first step is to check out your robots.txt file and ensure that you really do want to block those URLs being listed.

Sometimes there will be URLs listed in here that are not explicitly blocked by the robots.txt file. These should be looked at on an individual basis as some of them may have strange reasons for being in there. A good method to investigate is to run the questionable URLs through URI valet and see the response code for this. Also check your .htacess file to see if there is a rule that is redirecting the URL.

Soft 404s

If you have pages that have very thin content, or look like a landing page these may be categorized as a soft 404. This classification is not ideal, if you want a page to 404 you should make sure it returns a hard 404, and if your page is listed as a soft 404 and it is one of your main content pages, you need to fix that page to make sure it doesn’t get this error.

How Soft 404's come to be

If you are returning a 404 page and it is listed as a Soft 404, it means that the header HTTP response code does not return the 404 Page Not Found response code. Google recommends “that you always return a 404 (Not found) or a 410 (Gone) response code in response to a request for a non-existing page.“

We saw a bunch of these errors with one of our clients when we redirected a ton of broken URLs to a temporary landing page which only had an image and a few lines of text. Google saw this as a custom 404 page, even though it was just a landing page, and categorized all the redirecting URLs as Soft 404s.

Timed Out

If a page takes too long to load, the Googlebot will stop trying to call it after a while. Check your server logs for any issues and check the page load speed of your pages that are timing out.

Types of timed out errors:

DNS lookup timeout – the Googlebot request could not get to your domain’s server, check DNS settings. Sometimes this is on Google’s end if everything looks correct on your side. Pingdom has an EXCELLENT tool to check out the DNS health of your domain and it will show you any issues that pop up.
URL timeout – an error from one of your specific pages, not the whole domain.
Robots.txt timeout – If your robots.txt file exists but the server timed out when Google tried to crawl it, Google will postpone the crawl of your site until it can reach the robots.txt file to make sure it doesn’t crawl any URLs that were blocked by the robots.txt file. Note that if you do not have a robots.txt and Google gets a 404 from trying to access your robots.txt, it will continue on to crawl the site as it assumes that the file doesn’t exist.

Unreachable

Unreachable errors can occur from internal server errors or DNS issues. A page can also be labeled as Unreachable if the robots.txt file is blocking the crawler from visiting a page. Possible errors that fall under the unreachable heading are “No response”, “500 error”, and “DNS issue” errors.

Unreachable Errors Make Man Sad

There is a long list of possible reasons for unreachable errors, so rather than list it here, I’ll point you to Google’s own reference guide here. Rand also touched on the impact of server issues back in 2008.

Conclusion

Google Webmaster Tools is far from perfect. While we all appreciate Google’s transparency with showing us what they are seeing, there are still some things that need to be fixed. To start with, Google is the best search engine in the universe, yet you cannot search through your error reports to find that one URL from a month ago that was keeping you up at night. At least they could have supplemented this with good pagination, but nope you have to physically click through 20 pages of data to get to page 21. One workaround for this is to edit the page number by editing the end of the URL string that shows what part of the errors list you are looking at. You can download all of the data into an Excel document, which is the best solution, but Google should still upgrade Webmaster Tools to allow searching from within the application.

Also, the owner of the site should have the ability to delete ALL sitemaps on the domain they own, even if someone else uploaded it a year ago. Currently you can only delete the sitemap that you yourself uploaded through your Webmaster Tools account. If Jimmy from Agency X uploaded an image sitemap a year ago before you let them go, this will still show up in the All Sitemaps tab. The solution to get rid of it is to let the sitemap 404 and it will drop off eventually but it can be a thorn in your side to have to see it every day until it leaves.

Perhaps, as Bing starts to upgrade its own Webmaster Tools, we will begin to see some more competition between the two search engines in their product offerings. Then one day, just maybe, we will get complete transparency and complete control of our sites in the search engines.

Give me some feedback!

What success/obstacles have you ran into when troubleshooting Webmaster Tools errors?

Any recommendations for new users to this powerful but perplexing tool?

Do you like this post? Yes No

vineri, 16 decembrie 2011

Advanced On-Page Optimization - Whiteboard Friday

Advanced On-Page Optimization - Whiteboard Friday

Video Transcription

HTTP

In Sitemaps

Not Followed

Not Found

Restricted by robots.txt

Soft 404s

Timed Out

Unreachable

Conclusion

Niciun comentariu:

Trimiteți un comentariu

Pagini

Persoane interesate

Arhivă blog

vineri, 16 decembrie 2011

Advanced On-Page Optimization - Whiteboard Friday

Advanced On-Page Optimization - Whiteboard Friday

Video Transcription

HTTP

In Sitemaps

Not Followed

Not Found

Restricted by robots.txt

Soft 404s

Timed Out

Unreachable

Conclusion

Niciun comentariu:

Trimiteți un comentariu

Pagini

Persoane interesate

Arhivă blog

Subscribe to