Central Perk

marți, 22 octombrie 2013

A [Poorly] Illustrated Guide to Google's Algorithm

Posted: 21 Oct 2013 04:11 PM PDT

Like all great literature, this post started as a bad joke on Twitter on a Friday night:

If you know me, then this kind of behavior hardly surprises you (and I probably owe you an apology or two). What's surprising is that Google's Matt Cutts replied, and fairly seriously:

Matt's concern that even my painfully stupid joke could be misinterpreted demonstrates just how confused many people are about the algorithm. This tweet actually led to a handful of very productive conversations, including one with Danny Sullivan about the nature of Google's "Hummingbird" update.

These conversations got me thinking about how much we oversimplify what "the algorithm" really is. This post is a journey in pictures, from the most basic conception of the algorithm to something that I hope reflects the major concepts Google is built on as we head into 2014.

The Google algorithm

There's really no such thing as "the" algorithm, but that's how we think about itâ€"as some kind of monolithic block of code that Google occasionally tweaks. In our collective SEO consciousness, it looks something like this:

So, naturally, when Google announces an "update", all we see are shades of blue. We hear about a major algorithm update ever month or two, and yet Google confirmed 665 updates (technically, they used the word "launches") in 2012â€"obviously, there's something more going on here than just changing a few lines of code in some mega-program.

Inputs and outputs

Of course, the algorithm has to do something, so we need inputs and outputs. In the case of search, the most fundamental input is Google's index of the worldwide web, and the output is search engine result pages (SERPs):

Simple enough, right? Web pages go in, [something happens], search results come out. Well, maybe it's not quite that simple. Obviously, the algorithm itself is incredibly complicated (and we'll get to that in a minute), but even the inputs aren't as straightforward as you might imagine.

First of all, the index is really roughly a dozen data centers distributed across the world, and each data center is a miniature city unto itself, linked by one of the most impressive global fiber optic networks ever built. So, let's at least add some color and say it looks something more like this:

Each block in that index illustration is a cloud of thousands of machines and an incredible array of hardware, software and people, but if we dive deep into that, this post will never end. It's important to realize, though, that the index isn't the only major input into the algorithm. To oversimplify, the system probably looks more like this:

The link graph, local and maps data, the social graph (predominantly Google+) and the Knowledge Graphâ€"essentially, a collection of entity databasesâ€"all comprise major inputs that exist beyond Google's core index of the worldwide web. Again, this is just a conceptualization (I don't claim to know how each of these are actually structured as physical data), but each of these inputs are unique and important pieces of the search puzzle.

For the purposes of this post, I'm going to leave out personalization, which has its own inputs (like your search history and location). Personalization is undoubtedly important, but it impacts many areas of this illustration and is more of a layer than a single piece of the puzzle.

Relevance, ranking and re-ranking

As SEOs, we're mostly concerned (i.e. obsessed) with ranking, but we forget that ranking is really only part of the algorithm's job. I think it's useful to split the process into two steps: (1) relevance, and (2) ranking. For a page to rank in Google, it first has to make the cut and be included in the list. Let's draw it something like this:

In other words, first Google has to pick which pages match the search, and then they pick which order those pages are displayed in. Step (1) relies on relevanceâ€"a page can have all the links, +1s, and citations in the world, but if it's not a match to the query, it's not going to rank. The Wikipedia page for Millard Fillmore is never going to rank for "best iPhone cases," no matter how much authority Wikipedia has. Once Wikipedia clears the relevance bar, though, that authority kicks in and the page will often rank well.

Interestingly, this is one reason that our large-scale correlation studies show fairly low correlations for on-page factors. Our correlation studies only measure how well a page ranks once it's passed the relevance threshold. In 2013, it's likely that on-page factors are still necessary for relevance, but they're not sufficient for top rankings. In other words, your page has to clearly be about a topic to show up in results, but just being about that topic doesn't mean that it's going to rank well.

Even ranking isn't a single process. I'm going to try to cover an incredibly complicated topic in just a few sentences, a topic that I'll call "re-ranking." Essentially, Google determines a core ranking and what we might call a "pure" organic result. Then, secondary ranking algorithms kick inâ€"these include local results, social results, and vertical results (like news and images). These secondary algorithms rewrite or re-rank the original results:

To see this in action, check out my post on how Google counts local results. Using the methodology in that post, you can clearly see how Google determines a base set of rankings, and then the local algorithm kicks in and not only adds new features but re-ranks the original results. This diagram is only the tip of the icebergâ€"Bill Slawski has an excellent three-part series on re-ranking that covers 40 different ways Google may re-rank results.

Special inputs: penalties and disavowals

There are also special inputs (for lack of a better term). For example, if Google issues a manual penalty against a site, that has to be flagged somewhere and fed into the system. This may be part of the index, but since this process is managed manually and tied to Google Webmaster Tools, I think it's useful to view it as a separate concept.

Likewise, Google's disavow tool is a separate input, in this case one partially controlled by webmasters. This data must be periodically processed and then fed back into the algorithm and/or link graph. Presumably, there's a semi-automated editorial process involved to verify and clean this user-submitted data. So, that gives us something like this:

Of course, there are many inputs that feed other parts of the system. For example, XML sitemaps in Google Webmaster Tools help shape the index. My goal it to give you a flavor for the major concepts. As you can see, even the "simple" version is quickly getting complicated.

Updates: Panda, Penguin and Hummingbird

Finally, we have the algorithm updates we all know and love. In many cases, an update really is just a change or addition to some small part of Google's code. In the past couple of years, though, algorithm updates have gotten a bit more tricky.

Let's start with Panda, originally launched in February of 2011. The Panda update was more than just a tweak to the codeâ€"it was (and probably still is) a sub-algorithm with its own data structures, living outside of the core algorithm (conceptually speaking). Every month or so, the Panda algorithm would be re-run, Panda data would be updated, and that data would feed what you might call a Panda ranking factor back into the core algorithm. It's likely that Penguin operates similarly, in that it's a sub-algorithm and separate data set. We'll put them outside of the big, blue oval:

I don't mean to imply that Panda and Penguin are the sameâ€"they operate in very different ways. I'm simply suggesting that both of these algorithm updates rely on their own code and data sources and are only periodically fed back into the system.

Why didn't Google just re-write the algorithm to account for the Panda and/or Penguin intent? Part of it is computationalâ€"the resources required to process this data are beyond what the real-time infrastructure can probably handle. As Google gets faster and more powerful, these sub-algorithms may become fully integrated (and Panda is probably more integrated than it once was). The other reason may involve testing and mitigating impact. It's likely that Google only updates Penguin periodically because of the large impact that the first Penguin update had. This may not be a process that they simply want to let loose in real-time.

So, what about the recent Hummingbird update? There's still a lot we don't know, but Google has made it pretty clear that Hummingbird is a fundamental rewrite of how the core algorithm works. I don't think we've seen the full impact of Hummingbird yet, personally, and the potential of this new code may be realized over months or even years, but now we're talking about the core algorithm(s). That leads us to our final image:

Image credit for hummingbird silhouette: Michele Tobias at Experimental Craft.

The end result surprised even me as I created it. This was the most basic illustration I could make that didn't feel misleading or simplistic. The reality of Google today far surpasses this diagramâ€"every piece is dozens of smaller pieces. I hope, though, that this gives you a sense for what the algorithm really is and does.

Additional resources

If you're new to the algorithm and would like to learn more, Google's own "How Search Works" resource is actually pretty interesting (check out the sub-sections, not just the scroller). I'd also highly recommend Chapter 1 of our Beginner's Guide: "How Search Engines Operate." If you just want to know more about how Google operates, Steven Levy's book "In The Plex" is an amazing read.

Special bonus nonsense!

While writing this post, the team and I kept thinking there must be some way to make it more dynamic, but all of our attempts ended badly. Finally, I just gave up and turned the post into an animated GIF. If you like that sort of thing, then here you go...

Sign up for The Moz Top 10, a semimonthly mailer updating you on the top ten hottest pieces of SEO news, tips, and rad links uncovered by the Moz team. Think of it as your exclusive digest of stuff you don't have time to hunt down but want to read!

Join the White House Fall Garden Instagram Meetup

Featured

Join the White House Fall Garden Instagram Meetup

This Sunday, October 27th, we're holding a special tour of the White House gardens just for our Instagram followers. The tour will include the Jacqueline Kennedy Garden, the Rose Garden, and the South Lawn -- including the White House Kitchen Garden.

Find out more about how to join the meetup here.

Not on Instagram? Find out how to join the public tour.

Seth's Blog : The selfish cynic

The selfish cynic

Cynics are hard to disappoint. Because they imagine the worst in people and situations, reality rarely lets them down. Cynicism is a way to rehearse the let-downs the world has in store--before they arrive.

And the cynic chooses this attitude at the expense of the group. Because he can't bear to be disappointed, he shares his rehearsed disappointment with the rest of us, slowing down projects, betting on lousy outcomes and dampening enthusiasm.

Someone betting on the worst outcomes is going to be correct now and then, but that doesn't mean we need to have him on our team. I'd rather work with people brave enough to embrace possible futures at the expense of being disappointed now and then.

Don't expect kudos or respect for being a cynic. It's selfish.

• Email to a friend •

luni, 21 octombrie 2013

Mish's Global Economic Trend Analysis

Fed Wonders "Why Are Housing Inventories Low?"; More Than Meets the Fed's Eye
Dysfunctional Global Economy; Can Things Get Worse? Rediscovering the Price of Money
Growth in Social Security Benefits vs. Wage Growth

Fed Wonders "Why Are Housing Inventories Low?"; More Than Meets the Fed's Eye

Posted: 21 Oct 2013 06:20 PM PDT

Inquiring minds are reading the San Francisco Fed Economic Letter "Why Are Housing Inventories Low?"

Inventories of homes for sale have been slow to bounce back since the 2007–09 recession, despite steady house price appreciation since January 2012. One probable reason why many homeowners are not putting their homes on the market is that their properties may still be worth less than the value of their mortgages, which would leave them owing additional money after a sale. In other cases, homeowners may simply be hoping that house prices will continue to rise, allowing them to recover lost equity.

No matter what the condition of the economy might be, some base level of inventory for sale always exists in the housing market. Young homeowners may sell their homes in order to relocate for a job or because their family has gotten larger and they need more space. Older homeowners may sell because they no longer need so much space or they want to turn their housing investment into cash as they reach retirement. All these reasons for selling can be thought of as life-cycle motives not necessarily tied to the business cycle. Such noncyclical factors produce a general level of churning in the housing market. Nevertheless, inventories show a distinct cyclical pattern, rising in good times and falling in bad times. This could be due to the cyclical nature of credit conditions. The risk premiums charged by lenders and their willingness to accept loan applications tend to ease during good economic times, allowing more potential buyers to enter the market. At the same time though, the level of house prices is by far the most important cyclical variable that influences the inventory of homes for sale.

Two important points emerge from Figure 2. First, in the aggregate U.S. data, the for-rent inventory of homes as a share of total housing units has risen steadily during the recession and the recovery, while the for-sale inventory has steadily dropped and is now stabilizing.

The data do not extend far enough back to indicate whether this is typical over the economic cycle. But other sources, such as Census Bureau aggregate inventory data, suggest that the drop in owner-occupied units relative to renter-occupied units is unprecedented since the 1960s. This phenomenon is widespread. The surge in foreclosures during the housing bust cannot be the only cause of this shift.

In theory, falling house prices alone may keep some homeowners from selling. It may seem logical that decisions to sell should be based only on information about current and future market conditions. But David Genesove and Christopher Mayer (1997) show that homeowners take more time to sell their houses if prices have fallen since the original purchase. That is, two similar homeowners experiencing similar housing market conditions will behave differently if one of those homeowners has an unrealized loss on his or her house.

Another possible explanation for the breakdown in the normal relationship between prices and inventories of homes for sale is that homeowners may be taking a longer-term view of the housing market. It is well documented that house price changes are persistent, meaning that price rises are likely to be followed by more rises, and price drops by more drops. Homeowners with flexibility on the timing of their home sales can potentially take advantage of this persistence. If they observe prices going up, they may want to wait and gamble that the increases will continue, allowing them to sell later at a higher price.

The data are consistent with this explanation. Figure 4 confirms on a county level the negative relationship between prices and inventories shown at the aggregate level in Figure 1. On balance, counties that experienced relatively large increases in house prices over the past year also experienced relatively large declines in inventories available for sale.

Conclusion

History shows a long-run relationship between house prices and the number of houses available for sale. Thus, current inventories of homes for sale are low given more than a year of house price appreciation. County-level data suggest that many homeowners are waiting for prices to rise further in their markets. Markets that have seen the strongest house price appreciation and job growth are the ones where for-sale inventories have declined the most.

More Than Meets the Eye

I endorse the Fed's conclusion. However, the Fed's research analysis does not go far enough.

Inventories in some areas are low because "investors" are snapping houses up expecting further appreciation. Some of this is large scale investment like Blackrock, but Flippers are in the game too, especially at the high end.

Momentum trading is back in vogue as noted in "Bubblicious" High End Flipping Up 350%, Overall Flipping Down 13%.

Flipping is up at the high end but renting is up overall. As long as home prices keep rising the renters and the flippers and those buying on spec will do well. But ....

What About Demographics?

Who are the flippers and investors going to sell to? Aging boomers? Their kids fresh out of college with no job?

I suggest we are in the midst of an echo bubble that can only last as long as QE lasts, as long as Boomers keep on living, as long as cities don't collapse under pension obligations, and as long as taxes do not soar in an attempt to keep cities and pension plans alive.

How long is that? I really don't know. But it is far, far shorter than the life of the average mortgage. And even for all cash buyers, demographics suggest that rising rents are hardly a given.

Mike "Mish" Shedlock
http://globaleconomicanalysis.blogspot.com

Dysfunctional Global Economy; Can Things Get Worse? Rediscovering the Price of Money

Posted: 21 Oct 2013 11:46 AM PDT

Steen Jakobsen, chief economist of Saxo Bank in Denmark, says things are so bad they cannot get worse. Please consider Rediscovering the Price of Money.

I've been starting my speeches for some time now by saying: "I am the most optimistic I have been in almost thirty years in the market—if only because things can't get any worse."

Is that true, and more importantly, how do we get a fundamental change away from this extend-and-pretend which prevails not only in Europe but also the world?

History tells us that we only get real changes as a result of war, famine, social riots or collapsing stock markets. None of these is an issue for most of the world—at least not yet—but on the other hand we have never had less growth, worse demographics, or higher unemployment since WWII. This is a true paradox that somehow needs to be resolved, and quickly if we are to avoid wasting an entire generation of European youth.

Policymakers try to pretend we have achieved significant progress and stability as the result of their actions, but from a fundamental point of view that's a mere illusion. Italian banks today own more government debt than before the banking crisis, leaving them systematically more exposed to their own government, not less. The spread on government bonds between Germany and Club Med is down below historic averages, but the price has been a total suspension of the "price discovery" of money.

The price discovery of money is the cruel capitalistic part of any system. An economics textbook would call it the modus operandi by which capital is allocated where it can find the highest marginal utility. In practice, this should mean that the market dictates the price of money beyond one year—while at durations of less than one year, the central banks determine the price of money. The beauty of the system is that money is allocated in an auction where the highest bidder for "money" or "credit" gets filled on the price he or she deems to match his expected price of money.

Contrast the market-driven model with the present "success story" of relatively low sovereign spreads in Europe, which are driven by the European Central Bank president Mario Draghi's promise to do "whatever it takes" to keep the euro out of trouble. He has threatened to activate the European Financial Stability Facility and the European Stability Mechamism plus the full arsenal of policy tools to ensure stability.

By doing so, he has effectively suspended price discovery for sovereign debt and for money, as the ECB and local central banks will provide infinite liquidity to local banks and hence indirectly to their government in any market conditions. This one-sided offer from the ECB and the market means there is no power to discipline the government with higher rates or to allocate credit more generally. We have simply disconnected the market and the price of money.

This comes after Draghi's longer-term refinancing operation, a cheap funding for banks with little or no collateral, or the closest thing to quantitative easing you can have without calling it quantitative easing.

This is a problem because corporations that need to finance long-term projects, like building a power station over six to eight years, need a price for the credit they require throughout the building period. Right now they have an almost flat yield curve from zero to 30 years, which would be fine if it were realistic. But the problem is that one day in the "distant future" when the market normalises, interest rates should revert to their normal price, which is roughly inflation plus a risk premium.

In the case of an industrial company, an appropriate loan rate calculation could be something like: inflation plus Libor plus a risk spread, which might work out to about seven percent. Compare this with the rates available for highly creditworthy companies. Recently, Nestle was able to issue a four-year corporate bond at 0.75 percent—the lowest ever. Yes, it's nice for Nestle but remember the situation is created by the central banks presence in the market, not just due to the financial strength of Nestle.

A move from less than one percent to seven percent would administer an ugly shock to companies. We have created a negative vicious circle in which not only investors, but also companies are depending on low interest rates forever. They have priced their future earnings and costs on government support prices rather than on realistic market prices.

The worst thing about the situation, however, is that the reason a blue chip company like Nestle can borrow at less than one percent in the capital market is the lack of alternatives for banks and investors. Less creditworthy small and medium enterprises (SMEs) which make up as much as 80 percent of many countries' economies are not allowed to borrow. They are deemed too risky to lend to at the current "market rates" even though they hold the key to improving the employment and productivity picture.

They are willing to work cheaper, longer, harder and with higher risk tolerance in order to survive. So the remaining 20 percent of the economy occupied by large and publicly listed companies and banks gets 95 percent of all credit and 99 percent of all political capital. In other words, blue chips receive artificially low interest rates only because the SMEs don't get any credit. Herein lies my continued belief in the my traditional opening statement: things must get better soon because they can hardly get any worse.

We have never been in a more dysfunctional state at the corporate, political and individual level in history. It's time to realise that the reason capitalism won the war against communism in the 1980s was its strong market based economy—itself based on price discovery. Now the policymakers in their "wisdom" are copying everything a planned economy entails: central planning and control, no price discovery, one supplier of credit, money and the corollary effect of suppressing SMEs and even individuals.

Finally, history offers a compelling lesson: the last time the Federal Reserve engaged in a sizeable quantitative easing was in the 1940s. The low growth and falling inflation only reversed when the Federal Reserve stopped intervening due to a severe recession brought on by the policy mistakes of keeping QE in place too long.

In 2014, a bout of near or real recession in Germany and the US could kick start the price discovery mechanism again, which will help us to start healing the deep wounds left by years of policymakers compounding their errors with round after round of extend-and-pretend. Getting to the bottom is good in one sense: the only way is up.

By Steen Jakobsen

Can Things Get Worse?

I certainly agree with Steen on the dysfunctional state of the global economy, price discovery, and the implied bubble in corporate bonds. But are things so bad they cannot get worse?

In what sense? Any normalization of interest rates is going to crash the corporate bond market. When bonds crash, stocks will join the party. Is that better? Actually it is, but most won't view the accompanying recession as "getting better".

Illusion vs. Reality

The Fed offers an illusion that things are getting better now. The reality is the global economy has been getting more dysfunctional (and that dysfunction happened to lift financial assets).

Why Can't Things Get Worse?

Abenomics is guaranteed to heighten trade tensions and make things worse.
What about the surge of the Eurosceptics in parts of Europe?
What about the surge of Marine Le Pen' National Front party in France?

What's the Definition of "Better"?

What's "better" is in the eyes of the beholder. Some may construe anything that hastens the breakup of the eurozone is for the better.

Does the line stop at the National Front in France? With the neo-Nazi Golden Dawn party in Greece?

Breaking Point Optimism

Those who want to be optimistic on the basis "things cannot get more dysfunctional" have numerous things to be optimistic about

Global Equity Bubble
Corporate Bond Bubble
Sovereign Bond Bubble
Abenomics
Central Bank Intervention
China Property Bubble
China shadow Banking
Trade Protectionism
US Militarism
Social Security Promises
Pension Promises
European Bank Leverage
European Isolationism
Canadian Housing Prices
Australian Housing Prices

Recovery From the Ashes

Optimists who believe "things cannot get much worse" must expect the dam to break on most or all of those bubbles and distortions at once. Curiously if a dozen of the above dams/bubbles did break at once, it sure would not feel good when it happened, even to the optimists.

I suggest that if Steen's optimism is justified (and perhaps even if it's not) asset prices are highly likely to get clobbered, with equity prices dropping as much as 50%.

Yet, economically speaking, that would be a good thing, especially if a sound banking system (void of fractional reserve lending) arose from the economic ashes.

Unfortunately, a word of caution to the optimists is warranted. Central banks have a proven history of making matters worse over time and not learning anything along the way.

Mike "Mish" Shedlock
http://globaleconomicanalysis.blogspot.com

Growth in Social Security Benefits vs. Wage Growth

Posted: 21 Oct 2013 09:36 AM PDT

I have an update from reader Tim Wallace on Social Security.

Hello Mish

I made new Social Security charts that show:

Growth in percent from 1967 in average payout per month to those that receive social security

Amount of money paid out to those that recieve social security per worker in the USA

Average annual wages as presented in the Social Security systems "National Average Wage Index"

Senior citizens continue to receive all the benefits on the backs of the younger generations. By the way, I had to stop at 2011 as 2012 is not published yet.

Tim

Percentage Growth in Social Security Payments, Per Worker vs. Wage Growth

click on any chart for sharper image

Social Security Payments Per Worker

Demographics Says Path is Unsustainable

Clearly this payout trend is unsustainable, but what politician dare touch it?

Social Security is not that difficult a problem in theory (at least in comparison to Medicare), except for the politics of it all. Numerous things could be done to put the system in the green.

Possible Ways to Make Social Security Actuarially Sound

Raise retirement age
Raise or eliminate the cap on payroll taxes
Cut benefits
Collect Social Security on personal income
Implement a Tiered Cap structure
Means Testing

Democrats would oppose 1 and 3. Republicans might oppose all but 3. So, how does this mess end if politicians won't touch it?

Mike "Mish" Shedlock
http://globaleconomicanalysis.blogspot.com