duminică, 29 august 2010

Crawl Errors now reports soft 404s

Crawl Errors now reports soft 404s: "Webmaster Level: All

Today we’re releasing a feature to help you discover if your site serves undesirable 'soft” or “crypto” 404s. A 'soft 404' occurs when a webserver responds with a 200 OK HTTP response code for a page that doesn't exist rather than the appropriate 404 Not Found. Soft 404s can limit a site's crawl coverage by search engines because these duplicate URLs may be crawled instead of pages with unique content.

The web is infinite, but the time search engines spend crawling your site is limited. Properly reporting non-existent pages with a 404 or 410 response code can improve the crawl coverage of your site’s best content. Additionally, soft 404s can potentially be confusing for your site's visitors as described in our past blog post, Farewell to Soft 404s.    

You can find the new soft 404s reporting feature under the Crawl errors section in Webmaster Tools.



Here’s a list of steps to correct soft 404s to help both Google and your users:
  1. Check whether you have soft 404s listed in Webmaster Tools
  2. For the soft 404s, determine whether the URL:
    1. Contains the correct content and properly returns a 200 response (not actually a soft 404)
    2. Should 301 redirect to a more accurate URL
    3. Doesn’t exist and should return a 404 or 410 response
  3. Confirm that you’ve configured the proper HTTP Response by using Fetch as Googlebot in Webmaster Tools
  4. If you now return 404s, you may want to customize your 404 page to aid your users. Our custom 404 widget can help.

We hope that you’re now better enabled to find and correct soft 404s on your site. If you have feedback or questions about the new 'soft 404s' reporting feature or any other Webmaster Tools feature, please share your thoughts with us in the Webmaster Help Forum.

Written by Jonathan Simon, Webmaster Trends Analyst


"

Our new search index: Caffeine

Our new search index: Caffeine: "
(Cross-posted on the Official Google Blog)

Today, we're announcing the completion of a new web indexing system called Caffeine. Caffeine provides 50 percent fresher results for web searches than our last index, and it's the largest collection of web content we've offered. Whether it's a news story, a blog or a forum post, you can now find links to relevant content much sooner after it is published than was possible ever before.

Some background for those of you who don't build search engines for a living like us: when you search Google, you're not searching the live web. Instead you're searching Google's index of the web which, like the list in the back of a book, helps you pinpoint exactly the information you need. (Here's a good explanation of how it all works.)

So why did we build a new search indexing system? Content on the web is blossoming. It's growing not just in size and numbers but with the advent of video, images, news and real-time updates, the average webpage is richer and more complex. In addition, people's expectations for search are higher than they used to be. Searchers want to find the latest relevant content and publishers expect to be found the instant they publish.

To keep up with the evolution of the web and to meet rising user expectations, we've built Caffeine. The image below illustrates how our old indexing system worked compared to Caffeine:
Our old index had several layers, some of which were refreshed at a faster rate than others; the main layer would update every couple of weeks. To refresh a layer of the old index, we would analyze the entire web, which meant there was a significant delay between when we found a page and made it available to you.

With Caffeine, we analyze the web in small portions and update our search index on a continuous basis, globally. As we find new pages, or new information on existing pages, we can add these straight to the index. That means you can find fresher information than ever before — no matter when or where it was published.

Caffeine lets us index web pages on an enormous scale. In fact, every second Caffeine processes hundreds of thousands of pages in parallel. If this were a pile of paper it would grow three miles taller every second. Caffeine takes up nearly 100 million gigabytes of storage in one database and adds new information at a rate of hundreds of thousands of gigabytes per day. You would need 625,000 of the largest iPods to store that much information; if these were stacked end-to-end they would go for more than 40 miles.

We've built Caffeine with the future in mind. Not only is it fresher, it's a robust foundation that makes it possible for us to build an even faster and comprehensive search engine that scales with the growth of information online, and delivers even more relevant search results to you. So stay tuned, and look for more improvements in the months to come.

Posted by Carrie Grimes, Software Engineer


"

Quality links to your site

Quality links to your site: "A popular question on our Webmaster Help Forum is in regard to best practices for organic link building. There seems to be some confusion, especially among less experienced webmasters, on how to approach the topic. Different perspectives have been shared, and we would also like to explain our viewpoint on earning quality links.

If your site is rather new and still unknown, a good way marketing technique is to get involved in the community around your topic. Interact and contribute on forums and blogs. Just keep in mind to contribute in a positive way, rather than spamming or soliciting for your site. Just building a reputation can drive people to your site. And they will keep on visiting it and linking to it. If you offer long-lasting, unique and compelling content -- something that lets your expertise shine -- people will want to recommend it to others. Great content can serve this purpose as much as providing useful tools.

A promising way to create value for your target group and earn great links is to think of issues or problems your users might encounter. Visitors are likely to appreciate your site and link to it if you publish a short tutorial or a video providing a solution, or a practical tool. Survey or original research results can serve the same purpose, if they turn out to be useful for the target audience. Both methods grow your credibility in the community and increase visibility. This can help you gain lasting, merit-based links and loyal followers who generate direct traffic and 'spread the word.' Offering a number of solutions for different problems could evolve into a blog which can continuously affect the site's reputation in a positive way.

Humor can be another way to gain both great links and get people to talk about your site. With Google Buzz and other social media services constantly growing, entertaining content is being shared now more than ever. We've seen all kinds of amusing content, from ASCII art embedded in a site's source code to funny downtime messages used as a viral marketing technique to increase the visibility of a site. However, we do not recommend counting only on short-lived link-bait tactics. Their appeal wears off quickly and as powerful as marketing stunts can be, you shouldn't rely on them as a long-term strategy or as your only marketing effort.

It's important to clarify that any legitimate link building strategy is a long-term effort. There are those who advocate for short-lived, often spammy methods, but these are not advisable if you care for your site's reputation. Buying PageRank-passing links or randomly exchanging links are the worst ways of attempting to gather links and they're likely to have no positive impact on your site's performance over time. If your site's visibility in the Google index is important to you it's best to avoid them.

Directory entries are often mentioned as another way to promote young sites in the Google index. There are great, topical directories that add value to the Internet. But there are not many of them in proportion to those of lower quality. If you decide to submit your site to a directory, make sure it's on topic, moderated, and well structured. Mass submissions, which are sometimes offered as a quick work-around SEO method, are mostly useless and not likely to serve your purposes.

It can be a good idea to take a look at similar sites in other markets and identify the elements of those sites that might work well for yours, too. However, it's important not to just copy success stories but to adapt them, so that they provide unique value for your visitors.


Social bookmarks on YouTube enable users to share content easily


Finally, consider making linking to your site easier for less tech savvy users. Similar to the way we do it on YouTube, offering bookmarking services for social sites like Twitter or Facebook can help spread the word about the great content on your site and draw users' attention.

As usual, we'd like to hear your opinion. You're welcome to comment here in the blog, or join our Webmaster Help Forum community.

Written by Kaspar Szymanski, Search Quality Strategist, Dublin


"

Sitemaps: One file, many content types

Sitemaps: One file, many content types: "Webmaster Level: All

Have you ever wanted to submit your various content types (video, images, etc.) in one Sitemap? Now you can! If your site contains videos, images, mobile URLs, code or geo information, you can now create—and submit—a Sitemap with all the information.

Site owners have been leveraging Sitemaps to let Google know about their sites’ content since Sitemaps were first introduced in 2005. Since that time additional specialized Sitemap formats have been introduced to better accommodate video, images, mobile, code or geographic content. With the increasing number of specialized formats, we’d like to make it easier for you by supporting Sitemaps that can include multiple content types in the same file.

The structure of a Sitemap with multiple content types is similar to a standard Sitemap, with the additional ability to contain URLs referencing different content types. Here's an example of a Sitemap that contains a reference to a standard web page for Web search, image content for Image search and a video reference to be included in Video search:

<?xml version="1.0" encoding="UTF-8"?>
<urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9"
xmlns:image='http://www.sitemaps.org/schemas/sitemap-image/1.1'
xmlns:video="http://www.sitemaps.org/schemas/sitemap-video/1.1">
<url>
<loc>http://www.example.com/foo.html</loc>
<image:image>
<image:loc>http://example.com/image.jpg</image:loc>
</image:image>
<video:video>
<video:content_loc>http://www.example.com/videoABC.flv</video:content_loc>
<video:title>Grilling tofu for summer</video:title>
</video:video>
</url>
</urlset>

Here's an example of what you'll see in Webmaster Tools when a Sitemap containing multiple content types is submitted:



We hope the capability to include multiple content types in one Sitemap simplifies your Sitemap submission. The rest of the Sitemap rules, like 50,000 max URLs in one file and the 10MB uncompressed file size limit, still apply. If you have questions or other feedback, please visit the Webmaster Help Forum.

Written by Jonathan Simon, Webmaster Trends Analyst


"

New Message Center notifications for detecting an increase in Crawl Errors

New Message Center notifications for detecting an increase in Crawl Errors: "Webmaster Level: All

When Googlebot crawls your site, it’s expected that most URLs will return a 200 response code, some a 404 response, some will be disallowed by robots.txt, etc. Whenever we’re unable to reach your content, we show this information in the Crawl errors section of Webmaster Tools (even though it might be intentional and not actually an error). Continuing with our effort to provide useful and actionable information to webmasters, we're now sending SiteNotice messages when we detect a significant increase in the number of crawl errors impacting a specific site. These notifications are meant to alert you of potential crawl-related issues and provide a sample set of URLs for diagnosing and fixing them.

A SiteNotice for a spike in the number of unreachable URLs, for example, will look like this:


We hope you find SiteNotices helpful for discovering and dealing with issues that, if left unattended, could negatively affect your crawl coverage. You’ll only receive these notifications if you’ve verified your site in Webmaster Tools and we detect significant changes to the number of crawl errors we encounter on your site. And if you don't want to miss out on any these important messages, you can use the email forwarding feature to receive these alerts in your inbox.

If you have any questions, please post them in our Webmaster Help Forum or leave your comments below.

Posted by Pooja Shah and Jonathan Simon


"

To err is human, Video Sitemap feedback is divine!

To err is human, Video Sitemap feedback is divine!: "Webmaster Level: All

You can now check your Video Sitemap for even more errors right in Webmaster Tools! It’s a new Labs feature to signal issues in your Video Sitemap such as:
  • URLs disallowed by robots.txt
  • Thumbnail size errors (160x120px is ideal. Anything smaller than 90x50 will be rejected.)



Video Sitemaps help us to better crawl and extract information about your videos, so we can appropriately feature them in search results.

Totally new to Video Sitemaps? Check out the Video Sitemaps center for more information. Otherwise, take a look at this new Labs feature in Webmaster Tools.

Written by Jackie Lai, Video Search Team


"

Verification time savers —  Analytics included!

Verification time savers —  Analytics included!: "Webmaster Level: All

Nobody likes to duplicate effort. Unfortunately, sometimes it's a fact of life. If you want to use Google Analytics, you need to add a JavaScript tracking code to your pages. When you're ready to verify ownership of your site in other Google products (such as Webmaster Tools), you have to add a meta tag, HTML file or DNS record to your site. They're very similar tasks, but also completely independent. Until today.

You can now use a Google Analytics JavaScript snippet to verify ownership of your website. If you already have Google Analytics set up, verifying ownership is as simple as clicking a button.


This only works with the newer asynchronous Analytics JavaScript, so if you haven't migrated yet, now is a great time. If you haven't set up Google Analytics or verified yet, go ahead and set up Google Analytics first, then come verify ownership of your site. It'll save you a little time — who doesn't like that? Just as with all of our other verification methods, the Google Analytics JavaScript needs to stay in place on your site, or your verification will expire. You also need to remain an administrator on the Google Analytics account associated with the JavaScript snippet.

Don't forget that once you've verified ownership, you can add other verified owners quickly and easily through the Verification Details page. There's no need for each owner to manually verify ownership. More effort and time saved!


We've also introduced an improved interface for verification. The new verification page gives you more information about each verification method. In some cases, we can now provide detailed instructions about how to complete verification with your specific domain registrar or provider. If your provider is included, there's no need to dig through their documentation to figure out how to add a verification DNS record — we'll walk you through it.


The time you save using these new verification features might not be enough to let you take up a new hobby, but we hope it makes the verification process a little bit more pleasant. As always, please visit the Webmaster Help Forum if you have any questions.

Written by Sean Harding, Software Engineer


"