Central Perk: Web Directory Submission Danger: Analysis of 2,678 Directories Shows 20% Penalized/Banned by Google

joi, 31 mai 2012

Posted: 30 May 2012 02:14 PM PDT

Posted by Kurtis

Hi, my name's Kurtis and I'm relatively new here at Moz. My official title is "Captain of Special Projects," which means I spend a lot of time browsing strange parts of the web, assembling metrics and inputting data in Google Docs/Excel. If you walk past my desk in the Mozplex, be warned, investigating webspam is on my task list, hence you may come away slightly traumatized by what you see. I ward off the demons by taking care of two cats and fondly remembering my days as a semi-professional scoundrel in Minnesota.

Let's move on to my first public project, which came about after Google deindexed several directories a few weeks ago. This event left us wondering if there was a rhyme to their reason. So we decided to do some intensive data collection of our own and try to figure out what was really going on.

Research All the Directories

We gathered a total of 2,678 directories from lists like Directory Maximizer, Val Web Design, SEOTIPSY.com, and SEOmoz's own directory list (just the web directories were used), the search for clues began. Out of the 2,678 directories, only 200 were banned – not too shabby. However, there were 340 additional directories that had avoided being banned, but had been penalized.

We define banned as having no results in Google when a site:domain.com search is performed:

We defined penalized as meaning the directory did not show up when highly obvious queries including its title tag / brand name produced the directory deep in the results (and that this could be repeated for any internal pages on the site as well):

As you can see above, the directory itself is nowhere to be found despite the exact title query, yet it's clearly still indexed (as you can see below by performing a domain name match query):

Pegasus Domain Search

At first, the data for the banned directories had one common trait – none of them had a visible toolbar Pagerank. For the most part, this initial observation was fairly accurate. As we pressed on, the results became more sporadic. This leads me to believe that it may have been a manual update, rather than an algorithmic one, or at least, that no particular public metrics/patterns are clear from the directories that suffered a penalization/ban.

That is not to say the ones left unharmed are safe from a future algorithmic update. In fact, I suspect this update was intended to serve as a warning; Google will be cracking down on directories. Why? In my own humble opinion, most of the classic, "built-for-SEO-and-links" directories do not provide any benefit to users, falling under the category of non-content spam.

Some directories and link resource lists are likely going to be valuable and useful long term (e.g. CSS Beauty's collection of great designs, the Craft Site Directory or Public Legal's list of legal resources). These are obviously not in the same world as those "SEO directories" and thus probably don't deserve the same classification despite the nomenclature overlap.

Updated Directory List!

In the midst of the panic, a concerned individual brought to my attention that “half of our directories were deindexed” and wanted to know when we would be updating our list. If by half he meant 5 of the 228 we listed were banned and an additional 5 just penalized, then I’d have to agree. ;-) In any case, our list is now updated. Thanks for being patient!

Let's look at the data

We've set up two spreadsheets that show which directories were banned and/or penalized, plus a bit of data about each one. Please feel free to check them out for yourself.

SEOmoz Directory List

Directory Maximizer, Val Web Design, & SEOTIPSY Directory List

Additional Data Analysis

Given the size and scope of the data available, we're hoping that lots of you can jump in and perform your own analysis on these directories, and possibly find some other interesting correlations. As the process for checking for banning/penalization is very tedious and cumbersome, we likely won't be doing an analysis on this scale again in the very near future. But we may revisit it again in 6-12 months to see if things have changed and Google's cracking down more, letting some of the penalties/bans be lifted or making any other notable moves.

I look forward to your feedback and suggestions in the comments!

p.s. The Mozscape metrics (PA, DA, mozRank, etc) are from index 51, which rolled out at the start of May. Our new index, which was just released earlier today, will have more updated and possibly more useful/interesting data. If I have the chance, I'll try to update the public spreadsheets using those numbers.

Sign up for The Moz Top 10, a semimonthly mailer updating you on the top ten hottest pieces of SEO news, tips, and rad links uncovered by the Moz team. Think of it as your exclusive digest of stuff you don't have time to hunt down but want to read!

May Mozscape Index Update:164 Billion URLs

Posted: 30 May 2012 09:39 AM PDT

Posted by randfish

It's that time once again! Mozscape's latest index update is live as of today (and new data is in OSE, the mozBar and PRO by tomorrow). This new index is our largest yet, at 164 Billion URLs, however that comes with a few caveats. The major one is that we've got a smaller-than-normal number of domains in this index, so you may see link counts rising, while unique linking root domains shrink. I asked the team why this happened, and our data scientist, Matt, provided a great explanation:

We schedule URLs to be crawled based on their PA+external mozRank to crawl the highest quality pages. Since most high PA pages are on a few large sites this naturally biases to crawling fewer domains. To enforce some domain diversity the V2 crawlers introduced a set of domain mozRank limits that limit the crawl depth on each domain. However, this doesn't guarantee a diverse index when the crawl schedule is full (as we had for Index 52).

In this case, many lower quality domains with low PA/DA are cut from the schedule and out of the index. This is the same problem we ran into when we first switched to the V2 crawlers last year and the domain diversity dropped way down. We've since fixed the problem by introducing another hard constraint that always schedules a few pages from each domain, regardless of PA. This was implemented a few weeks ago and the domain numbers for Index 53 are going back up to 153 million.

Thankfully, the domains affected should be at the far edges of the web - those that aren't well linked-to or important. Still, we recognize this is important and thus are focused on balancing these moving forward.

Several other points may be of interest as well:

Last index took nearly 13 weeks to process, this one's only 7 weeks. This means relatively fresher data, though not as fresh as we'd like. The oldest information will be from February and the newest from mid-April.
Of all the URLs on which data was requested in the last month, this update has data for 88.56% of them (this is only very slightly lower than last index's 88.80%)
This index still has very high correlations with rankings. Below are a few samples of Spearman correlations with higher rankings in Google.com (US):
- Page Authority (PA) - 0.38
- Domain Authority (DA) - 0.26
- URL MozRank (mR) - 0.20
- URL MozTrust (mT) - 0.22
- Linking Root Domains to the URL - 0.29
- Total # of Links to the URL - 0.22

This bit is important: Next index, we're going back down to between 70-90 billion URLs, and focusing on getting back to much fresher updates (we're even aiming to get to updates every 2 weeks, though this is a challenging goal, not a guarantee). The 150 billion+ page indices are an awesome milestone, but as you've likely noticed, the extra data does not equate with hugely better correlations nor even with massively higher amounts of data on the URLs most of our customers care about (as an example, in index 50, we had ~53 billion pages and 82.09% of URLs requested had data). That said, once our architecture is more stable, we will be aiming to get to both huge index sizes and dramatically better freshness. Look for tons of work and improvements over the summer on both fronts.

Below are the stats for Index 52:

164,569,893,828 (164 billion) URLs
1,222,033,252 (1.22 billion) Subdomains
117,444,355 (117 million) Root Domains
1,784,256,496,532 (1.7 trillion) Links
Followed vs. Nofollowed
- 2.57% of all links found were nofollowed
- 64.91% of nofollowed links are internal
- 35.09% are external
Rel Canonical - 11.33% of all pages now employ a rel=canonical tag
The average page has 85.12 links on it
- 74.38 internal links on average
- 10.74 external links on average

Feedback is greatly appreciated - this index should help with Penguin link data idenitification substantively more than our prior one, and the next one should be even more useful for that. Do remember that since this index stopped crawling and began processing in mid-April, link additions/removals that have happened since won't be reflected. Our next index will, hopefully, be out with 5 or fewer weeks of processing, to enhance that freshness. We're excited to see how this affects correlations and data quality.

Recovering from an Over Optimization Penalty - A True Story

Posted: 30 May 2012 04:08 AM PDT

Posted by NickEubanks

This post was originally in YouMoz, and was promoted to the main blog because it provides great value and interest to our community. The author's views are entirely his or her own and may not reflect the views of SEOmoz, Inc.

As almost anyone reading this post already knows, April 24, 2012 marked a big day in the search industry. Once the initial Penguin update was rolled out (please believe me this is only the beginning and there is much more to come) the SEO industry, as we know it exploded in a flurry of fear and satisfaction.

For those of us that had sites get hit (I'll admit I had several sites dinged by this update) what started out as anger quickly turned to fear and curiosity. Many industry publications jumped the gun and, in my opinion, began publishing tips and processes on how to 'recover from Penguin,' when the truth is, as mentioned by Ian Howells in his recent SEO Podcast, it's really too soon to tell the full effects of these algorithm updates and anyone out there preaching is really just speculating. The best information I have seen thus far is from The HOTH, and that is DON'T PANIC, and BE PATIENT.

I did, however, have a real life experience where one of my sites, my own personal blog, got hit for what I am now almost sure was over optimization, and I was able to recover. What's really interesting to me is that my site, nickeubanks.com, got hit at all... let me explain. My personal site is low traffic, low importance. I do not build links to it, I do not monetize it, it really just exists to serve as my digital resume and a place for me to openly ramble or rant when I feel like it.

The Penguin Smack

Here is a screen shot of Google on Friday May 18, 2012. After a friend of mine reached out to me to ask if I had taken my site down (of course he didn't just go and check the domain :P) I asked him what he was searching for. He mentioned he had typed in some words in the title from what he could remember and my name - which should be more than sufficient to generate a SERP with my post(s). It did not. Instead this is what he was seeing:

Google SERP 05-18-12

With Inbound.org starting off the list, it was page after page of places that linked to my post - but not the post itself. My immediate reaction was fear that somehow my site was sandboxed. So to check I did a quick search for the full post title in quotes and there it was... what does this mean? That my site was penalized... but for what? As I mentioned before I don't do any active linking, advertising, and the site has slim to no traffic. My first thought was that I might have been a victim of Negative SEO. I logged into Webmaster Tools and pulled down my indexed Google back-link profile, which I have put into a public Google Doc here so you can see it. Upon review you'll see this is a pretty natural back-link profile, even with some links from some pretty authoritative websites... at this point I am scrambling for answers...

What The Hell is Going On!?

I was racking my brain to think of what it could possibly be that was causing my site to be buried in the SERP's, especially for posts that have a lot of natural links, social signals, and are full of unique, well written content (note: I didn't write most of the content in these posts).

I reached out to my buddy Mark Kennedy, as among the Philly SEO crowd he is certainly one of the most passionate SEO's I have ever met. He had the same line of thinking that I did and immediately hit up ahref's looking for spam-links or clues. Nothing. His next suggestion was to pour over any recent changes I made to the website. I reviewed some of the CSS changes and couldn't find any messy code or mistakes that may have warranted the site to be dinged (Did I mess up my headers? Did I botch a declarations statement?) Nothing.

The only thing I could think of was to really take a closer look at my links, so I started inspecting each of the sites that was linking to me. During this process I stumbled across my old blog from college, 23Run.com. Here are the Google indexed links from 23Run. As you can see there are 77 of them, which out of my total indexed link profile, is roughly 11%.

I went to 23Run.com to take a closer look at how my site was linked:

Over Optimized Anchor Text

I Had to Change This

And there it was... right in line with the Pengiun post from Microsite Masters showing sample data from their analysis, I had over 10% of my links over-optimized for anchor text. So I made this quick change:

Natural anchor text

How Long Will I Have to Wait?

And then needed to gauge about how long it would take for Google to crawl my site and index these changes, so I took a quick peak at my average crawl rate in Webmaster Tools:

Nick Eubanks Dot Com May 2012 Crawl Stats

and seeing that my average crawl rate was 59 pages, but my low was 24, I decided to give it the weekend and check back on Monday May 21, 2012. When Monday's production activity calmed down, sometime in the early afternoon, I decided to run the query again and alas;

Resurrection!

Post-Penguin-Google-SERP_Fixed

It is still ranking underneath Inbound.org, which is a bit strange, but it's back!

Furthermore, the post is back to ranking for more broad terms, such as 'fresh insights nick eubanks' as you can see below:

Google SERP Restored

Conclusion & Takeaways

Plain and simple, over-optimized anchor text can be dangerous. What was once the holy grail of SEO, getting links with your target keywords in the anchor text, is now something that requires careful planning and attention.

My advice is to develop your link profile to not look natural, but to be natural. If your anchor text is 'over optimized', you run the risk of being penalized, so make the effort and put in the time to naturalize your links. Try to replace anchor text links with naked URL's or at the very least more natural anchor text - try to think about these links in the same sense of someone who doesn't know you, finding your page or post and creating a link organically; most likely it won't be your target keywords but your name, page/post title, or a more generic link text such as read more, learn more, etc.

I hope my real-life example proves useful and helps, in any small way, to dispel some of the speculation out there. Thanks for reading.

joi, 31 mai 2012

Web Directory Submission Danger: Analysis of 2,678 Directories Shows 20% Penalized/Banned by Google

Web Directory Submission Danger: Analysis of 2,678 Directories Shows 20% Penalized/Banned by Google

Updated Directory List!

Let's look at the data

Additional Data Analysis

The Penguin Smack

What The Hell is Going On!?

I Had to Change This

How Long Will I Have to Wait?

Resurrection!

Conclusion & Takeaways

Niciun comentariu:

Trimiteți un comentariu

Pagini

Persoane interesate

Arhivă blog

joi, 31 mai 2012

Web Directory Submission Danger: Analysis of 2,678 Directories Shows 20% Penalized/Banned by Google

Web Directory Submission Danger: Analysis of 2,678 Directories Shows 20% Penalized/Banned by Google

Updated Directory List!

Let's look at the data

Additional Data Analysis

The Penguin Smack

What The Hell is Going On!?

I Had to Change This

How Long Will I Have to Wait?

Resurrection!

Conclusion & Takeaways

Niciun comentariu:

Trimiteți un comentariu

Pagini

Persoane interesate

Arhivă blog

Subscribe to