Central Perk

luni, 7 ianuarie 2013

Learn About Robots.txt with Interactive Examples

Posted: 06 Jan 2013 06:43 PM PST

One of the things that excites me most about the development of the web is the growth in learning resources. When I went to college in 1998, it was exciting enough to be able to search journals, get access to thousands of dollars-worth of textbooks, and download open source software. These days, technologies like Khan Academy, iTunesU, Treehouse and Codecademy take that to another level.

I've been particularly excited by the possibilities for interactive learning we see coming out of places like Codecademy. It's obviously most suited to learning things that look like programming languages - where computers are naturally good at interpreting the "answer" - which got me thinking about what bits of online marketing look like that.

The kinds of things that computers are designed to interpret in our marketing world are:

Search queries - particularly those that look more like programming constructs than natural language queries such as [site:distilled.net -inurl:www]
The on-site part of setting up analytics - setting custom variables and events, adding virtual pageviews, modifying e-commerce tracking, and the like
Robots.txt syntax and rules
HTML constructs like links, meta page information, alt attributes, etc.
Skills like Excel formulae that many of us find a critical part of our day-to-day job

I've been gradually building out codecademy-style interactive learning environments for all of these things for DistilledU, our online training platform, but most of them are only available to paying members. I thought it would make a nice start to 2013 to pull one of these modules out from behind the paywall and give it away to the SEOmoz community. I picked the robots.txt one because our in-app feedback is showing that it's one of the ones from which people learned the most.

Also, despite years of experience, I discovered some things I didn't know as I wrote this module (particularly about precedence of different rules and the interaction of wildcards with explicit rules). I'm hoping that it'll be useful to many of you as well - beginners and experts alike.

Interactive guide to Robots.txt

Robots.txt is a plain-text file found in the root of a domain (e.g. www.example.com/robots.txt). It is a widely-acknowledged standard and allows webmasters to control all kinds of automated consumption of their site, not just by search engines.

In addition to reading about the protocol, robots.txt is one of the more accessible areas of SEO since you can access any site's robots.txt. Once you have completed this module, you will find value in making sure you understand the robots.txt files of some large sites (for example Google and Amazon).

For each of the following sections, modify the text in the textareas and see them go green when you get the right answer.

Basic Exclusion

The most common use-case for robots.txt is to block robots from accessing specific pages. The simplest version applies the rule to all robots with a line saying User-agent: *. Subsequent lines contain specific exclusions that work cumulatively, so the code below blocks robots from accessing /secret.html.

Add another rule to block access to /secret2.html in addition to /secret.html.

User-agent: * Disallow: /secret.html

Exclude Directories

If you end an exclusion directive with a trailing slash ("/") such as Disallow: /private/ then everything within the directory is blocked.

Modify the exclusion rule below to block the folder called secret instead of the page secret.html.

User-agent: * Disallow: /secret.html

Allow Specific Paths

In addition to disallowing specific paths, the robots.txt syntax allows for allowing specific paths. Note that allowing robot access is the default state, so if there are no rules in a file, all paths are allowed.

The primary use for the Allow: directive is to over-ride more general Disallow: directives. The precedence rule states that "the most specific rule based on the length of the [path] entry will trump the less specific (shorter) rule. The order of precedence for rules with wildcards is undefined.".

We will demonstrate this by modifying the exclusion of the /secret/ folder below with an Allow: rule allowing /secret/not-secret.html. Since this rule is longer, it will take precedence.

User-agent: * Disallow: /secret/

Restrict to Specific User Agents

All the directives we have worked with have applied equally to all robots. This is specified by the User-agent: * that begins our commands. By replacing the *, however, we can design rules that only apply to specific named robots.

Replace the * with googlebot in the example below to create a rule that applies only to Google's robot.

User-agent: * Disallow: /secret/

Add Multiple Blocks

It is possible to have multiple blocks of commands targeting different sets of robots. The robots.txt example below will allow googlebot to access all files except those in the /secret/ directory and will block all other robots from the whole site. Note that because there is a set of directives aimed explicitly at googlebot, googlebot will entirely ignore the directives aimed at all robots. This means you can't build up your exclusions from a base of common exclusions. If you want to target named robots, each block must specify all its own rules.

Add a second block of directives targeting all robots (User-agent: *) that blocks the whole site (Disallow: /). This will create a robots.txt file that blocks the whole site from all robots except googlebot which can crawl any page except those in the /secret/ folder.

User-agent: googlebot Disallow: /secret/

Use More Specific User Agents

There are occasions when you wish to control the behavior of specific crawlers such as Google's Images crawler differently from the main googlebot. In order to enable this in robots.txt, these crawlers will choose to listen to the most specific user-agent string that applies to them. So, for example, if there is a block of instructions for googlebot and one for googlebot-images then the images crawler will obey the latter set of directives. If there is no specific set of instructions for googlebot-images (or any of the other specialist googlebots) they will obey the regular googlebot directives.

Note that a crawler will only ever obey one set of directives - there is no concept of cumulatively applying directives across groups.

Given the following robots.txt, googlebot-images will obey the googlebot directives (in other words will not crawl the /secret/ folder. Modify this so that the instructions for googlebot (and googlebot-news etc.) remain the same but googlebot-images has a specific set of directives meaning that it will not crawl the /secret/ folder or the /copyright/ folder:

User-agent: googlebot Disallow: /secret/

Basic Wildcards

Trailing wildcards (designated with *) are ignored so Disallow: /private* is the same as Disallow: /private. Wildcards are useful however for matching multiple kinds of pages at once. The star character (*) matches 0 or more instances of any valid character (including /, ?, etc.).

For example, Disallow: news*.html blocks:

news.html
news1.html
news1234.html
newsy.html
news1234.html?id=1

But does not block:

newshtml note the lack of a "."
News.html matches are case sensitive
/directory/news.html

Modify the following pattern to block only pages ending .html in the blog directory instead of the whole blog directory:

User-agent: * Disallow: /blog/

Block Certain Parameters

One common use-case of wildcards is to block certain parameters. For example, one way of handling faceted navigation is to block combinations of 4 or more facets. One way to do this is to have your system add a parameter to all combinations of 4+ facets such as ?crawl=no. This would mean for example that the URL for 3 facets might be /facet1/facet2/facet3/ but that when a fourth is added, this becomes /facet1/facet2/facet3/facet4/?crawl=no.

The robots rule that blocks this should look for *crawl=no (not *?crawl=no because a query string of ?sort=asc&crawl=no would be valid).

Add a Disallow: rule to the robots.txt below to prevent any pages that contain crawl=no being crawled.

User-agent: * Disallow: /secret/

Match Whole Filenames

As we saw with folder exclusions (where a pattern like /private/ would match paths of files contained within that folder such as /private/privatefile.html), by default the patterns we specify in robots.txt are happy to match only a portion of the filename and allow anything to come afterwards even without explicit wildcards.

There are times when we want to be able to enforce a pattern matching an entire filename (with or without wildcards). For example, the following robots.txt looks like it prevents jpg files from being crawled but in fact would also prevent a file named explanation-of-.jpg.html from being crawled because that also matches the pattern.

If you want a pattern to match to the end of the filename then we should end it with a $ sign which signifies "line end". For example, modifying an exclusion from Disallow: /private.html to Disallow: /private.html$ would stop the pattern matching /private.html?sort=asc and hence allow that page to be crawled.

Modify the pattern below to exclude actual .jpg files (i.e. those that end with .jpg).

User-agent: * Disallow: *.jpg

Add an XML Sitemap

The last line in many robots.txt files is a directive specifying the location of the site's XML sitemap. There are many good reasons for including a sitemap for your site and also for listing it in your robots.txt file. You can read more about XML sitemaps here.

You specify your sitemap's location using a directive of the form Sitemap: <path>.

Add a sitemap directive to the following robots.txt for a sitemap called my-sitemap.xml that can be found at /my-sitemap.xml.

User-agent: * Disallow: /private/

Add a Video Sitemap

In fact, you can add multiple XML sitemaps (each on their own line) using this syntax. Go ahead and modify the robots.txt below to also include a video sitemap called my-video-sitemap.xml that lives at /my-video-sitemap.xml.

User-agent: * Disallow: /private/ Sitemap: /my-sitemap.xml

What to do if you are stuck on any of these tests

Firstly, there is every chance that I've made a mistake with my JavaScript tests to fail to grade some correct solutions the right way. Sorry if that's the case - I'll try to fix them up if you let me know.

Whether you think you've got the answer right (but the box hasn't gone green) or you are stuck and haven't got a clue how to proceed, please just:

Check the comments to see if anyone else has had the same issue; if not:
Leave a comment saying which test you are trying to complete and what your best guess answer is

This will let me help you out as quickly as possible.

Obligatory disclaimers

Please don't use any of the robots.txt snippets above on your own site - they are illustrative only (and some would be a very bad idea). The idea of this post is to teach the general principles about how robots.txt files are interpreted rather than to explain the best ways of using them. For more of the latter, I recommend the following posts:

How to block content from the search results (pro-tip - don't rely on robots.txt despite my examples above excluding "secret" files and folders)
Learn more about why you might want to block robots from certain areas of your site
Avoid accidentally giving conflicting directives with the various different ways of blocking robots
Read up on some "don'ts" (old but still relevant): robots.txt misuse, accidentally blocking link juice

I hope that you've found something useful in these exercises whether you're a beginner or a pro. I look forward to hearing your feedback in the comments.

Sign up for The Moz Top 10, a semimonthly mailer updating you on the top ten hottest pieces of SEO news, tips, and rad links uncovered by the Moz team. Think of it as your exclusive digest of stuff you don't have time to hunt down but want to read!

From the Archives: Marshmallow Cannon

Your Daily Snapshot for
Monday, January 7, 2013

From the Archives: Marshmallow Cannon

The White House recently crossed 100,000,000 views on its YouTube channel and marked the milestone by looking at some of our most popular videos, including this raw footage of President Obama launching a marshmallow cannon at the White House Science Fair.

Watch the marshmallow cannon launch:

Seth's Blog : Two kinds of mistakes

Two kinds of mistakes

There is the mistake of overdoing the defense of the status quo, the error of investing too much time and energy in keep things as they are.

And then there is the mistake made while inventing the future, the error of small experiments gone bad.

We are almost never hurt by the second kind of mistake and yet we persist in making the first kind, again and again.

• Email to a friend •

duminică, 6 ianuarie 2013

Mish's Global Economic Trend Analysis

Startling Look at Employment Demographics by Age Group: Spotlight on Age 25-54
Obama's Chance to Do Something Right: Nominate Hagel for Secretary of Defense; Why the War Party Fears Hagel
The Bears' Prayer

Startling Look at Employment Demographics by Age Group: Spotlight on Age 25-54

Posted: 06 Jan 2013 10:28 PM PST

Last month I posted a chart showing employment by age group. Here is an update as of Friday's job release.

Employment Demographics by Age Group

click on chart for sharper image

Note that 100% of the job growth since the recession is in age group 55 and over.

Last month, someone proposed the above chart was blatantly misleading because it does not reflect the aging workforce.

Let's investigate that hypothesis with a look at actual data (numbers in tables and charts in thousands).

Civilian Institutional Population (CP) and Labor Force (LF)

Year	16-19 CP	20-24 CP	25-54 CP	55+ CP	16-19 LF	20-24 LF	25-54 LF	55+ LF
2000	15912	18311	120656	57697	8271	14250	101393	18668
2001	15929	18877	121604	58683	7902	14557	101789	19485
2002	15994	19348	122077	60151	7585	14781	101719	20778
2003	16096	19801	123289	61981	7170	14928	102309	22104
2004	16222	20197	123410	63527	7114	15154	102122	23011
2005	16398	20276	124175	65233	7164	15127	102773	24257
2006	16678	20265	124884	66988	7281	15113	103566	25468
2007	16982	20427	125696	68761	7012	15205	104353	26554
2008	17075	20409	125652	70652	6858	15174	104396	27858
2009	17043	20524	125565	72668	6390	14971	103742	29040
2010	16901	21047	125290	74591	5906	15028	102940	30014
2011	16774	21423	124704	76716	5727	15270	101744	30876
2012	16984	21799	124314	80187	5823	15462	101253	32437

Age Group 25-54 Key Facts

In 2007 the civilian population was 125,652,000
In 2007 the labor force was 104,353,000
In 2012 the civilian population was 124,314,000
In 2012 the labor force was 101,253,000

Numbers are non-adjusted from BLS tables.

Simply put, the decrease in civilian population in age group 25-54 was 1,340,000. The decrease in the labor force was a staggering 3,100,000!

Let's explore this idea in still more detail looking at employment, unemployment, and non-employment.

Spotlight on Age Group 25-54

Year	25-54 CP	25-54 LF	25-54 Employed	25-54 Not Employed	25-54 Unemployed
2000	120656	101393	98292	22364	3102
2001	121604	101789	97948	23656	3842
2002	122077	101719	96823	25254	4896
2003	123289	102309	97178	26111	5131
2004	123410	102122	97472	25938	4650
2005	124175	102773	98517	25658	4256
2006	124884	103566	99672	25212	3894
2007	125696	104353	100450	25246	3904
2008	125652	104396	99369	26283	5027
2009	125565	103742	95144	30421	8597
2010	125290	102940	94082	31208	8858
2011	124704	101744	93674	31030	8069
2012	124314	101253	94150	30164	7103

Notes

Unemployment is the difference between employment and the labor force.
Not-employed is the difference between employment and the civilian population.
Numbers are non-adjusted from BLS tables.
There may be rounding errors.

More Key Facts For Age Group 25-54

Between 2007 and 2012 the civilian population declined by 1,340,000
Between 2007 and 2012 the labor force declined by 3,100,000
Between 2007 and 2012 employment fell from 100,450,000 to 94,150,000.
Between 2007 and 2012 employment declined by 6,300,000 jobs on a mere decrease in the civilian population of 1,340,000!

Let's take a look at the above table in chart form.

Civilian Population, Labor Force, Employed, Not-Employed

click on chart for sharper image

Irrefutable Evidence Falling Employment Not Based on Boomer Demographics

This plunge in employment in the prime working age group of 25-54 is irrefutable proof that the drop in employment and the falling participation rate is not based on aging boomer demographics.

By calculation, 4,960,000 jobs (6,300,000 - 1,340,000) simply vanished into thin air (in age group 25-54 alone).

Thus, the plunge in employment in the prime working age group of 25-54 also provides strong evidence the stated unemployment rate of 7.8% is bogus by a more sensible measure of unemployment.

Better Measure of Unemployment

I propose this simple definition: If you want a job, are physically able to work a job, and you don't have a job, then you are unemployed.

Actual measures are purposely defined to hide the true state of the economy.

For a close scrutiny of the latest jobs report, please see Establishment Survey +155,000 Jobs; Household Survey +28,000 Jobs; Unemployment Rate Revised Up, Flat Since September

Mike "Mish" Shedlock
http://globaleconomicanalysis.blogspot.com

"Wine Country" Economic Conference Hosted By Mish
Click on Image to Learn More

Obama's Chance to Do Something Right: Nominate Hagel for Secretary of Defense; Why the War Party Fears Hagel

Posted: 06 Jan 2013 05:43 PM PST

The Financial Time reports Hagel nomination expected this week.

US President Barack Obama is poised to nominate Chuck Hagel as secretary of defence, setting the stage for a tough nomination fight focusing on the former Republican senator's views on Israel and Iran.

The announcement by Mr Obama of a new Pentagon chief to replace Leon Panetta could come as early as Monday, administration officials indicated. Mr Obama returned from a holiday in Hawaii on Sunday.

Mr Hagel's possible nomination has caused an uproar among neoconservatives over his questioning of sanctions and military action against Iran and his statement that a "Jewish lobby" intimidates Congress.

Many Democrats have been unenthusiastic as well, because he is a Republican and over a past statement criticising a Clinton-era diplomatic appointment as "openly, aggressively gay".

But the criticism has been especially virulent from the right, with Israel conservatives labelling him borderline anti-Semitic and suggesting he was intent in making dangerously deep cuts to the defence budget.

Lindsey Graham, a Republican senator from South Carolina and a prominent defence hawk, said on Sunday he was inclined not to support his former Senate colleague because of his "antagonistic" attitude to Israel.

"This is an in-your-face nomination by the president for all those who are supportive of Israel," Mr Graham told CNN.

Got That?

Democrats don't want Hagel simply because Hagel is a Republican. The Republicans do not want him because he is not a war-monger.

That's what this whole thing boils down to.

Why the War Party Fears Hagel

Let's fill in the details with a look at Why the War Party Fears Hagel

Who is Chuck Hagel?

Born in North Platte, Neb., he was a squad leader in Vietnam, twice wounded, who came home to work in Ronald Reagan's 1980 campaign, was twice elected U.S. senator, and is chairman of the Atlantic Council and co-chair of the President's Foreign Intelligence Advisory Board.

To The Weekly Standard's Bill Kristol, however, Hagel is a man "out on the fringes," who has a decade-long record of "hostility to Israel" and is "pro-appeasement-of-Iran."

Hagel's enemies contend that his own words disqualify him.

First, he told author Aaron David Miller that the "Jewish lobby intimidates a lot of people up there" on the Hill. Second, he urged us to talk to Hamas, Hezbollah, Iran. Third, Hagel said several years ago, "A military strike against Iran ... is not a viable, feasible, responsible option."

Hagel has conceded he misspoke in using the phrase "Jewish lobby." But as for a pro-Israel lobby, its existence is the subject of books and countless articles. When AIPAC sends up to the Hill one of its scripted pro-Israel resolutions, it is whistled through. Hagel's problem: He did not treat these sacred texts with sufficient reverence.

"I am a United States senator, not an Israeli senator," he told Miller. "I support Israel. But my first interest is I take an oath ... to the Constitution of the United States. Not to a president. Not to a party. Not to Israel. If I go run for Senate in Israel, I'll do that."

Hagel puts U.S. national interests first. And sometimes those interests clash with the policies of the Israeli government.

Chuck Hagel allies launch counter-attack

Politico reports Chuck Hagel allies launch counter-attack.

Brent Scowcroft, who was national security adviser to Presidents Gerald Ford and George H.W. Bush, said Hagel "has a very broad view of American foreign policy and the role in the world. He is very judicious, and he has an outstanding record as a senator, which gives him the knowledge and background to understand about the sometimes fractious relationship between the Congress, especially the Senate, and the administration."

"He got two Purple Hearts on the front lines," Scowcroft added. "That's about the best recommendation you can get from somebody whose job would be to advise on the use of troops around the world. I am honestly surprised, even astonished, at the attacks. I do know where they're coming from, but I don't understand the genesis of them.

Sen. Jack Reed (D-R.I.), a former Army Ranger who serves on the Armed Services Committee and has traveled to war zones with Hagel, said: "Every man and woman in uniform in the Pentagon and across the world will know that he's not only talked the talk, he's walked the walk. … He also has a successful business record. He is an entrepreneur who's succeeded.

Zbigniew Brzezinski, national security adviser to President Jimmy Carter, forcefully defended Hagel on MSNBC's "Morning Joe": "Unlike some of his critics, … he has fought for his country. He has been wounded for this country. He is a man who knows what war is like."

Nonviable Options

I support anyone willing to make this statement "A military strike against Iran ... is not a viable, feasible, responsible option." vs. anyone not willing to make the same statement.

A military strike on Iran would be idiotic, and I have no doubt one would have happened had Romney been elected.

It remains to be see if Obama can get this right. However, Hagel as Secretary of Defense would be a step in the right direction.

Mike "Mish" Shedlock
http://globaleconomicanalysis.blogspot.com

The Bears' Prayer

Posted: 06 Jan 2013 09:58 AM PST

Given the Chicago Bears were put out of their misery last week, missing the playoffs I offer this prayer for 2013

Our Papabear
Who art in heaven
Hallas be thy name
We're havin' no fun
Cause we sure play dumb
At home as we do away
Give us this season
Coach Lovie's Leavin'
And forgive us you must
As we forgive those, scoring big against us
And lead us not into the playoffs
But deliver us from Cutler
Amen

I wrote that in 1997 and only needed to change a few words. Back then it was

Give us this season
Dave Wanstedt's Leavin'

In 1997 I ended with "deliver us from the Packers"

Since Lovie is gone, part of the prayer has been realized already. Fans still await much needed delivery from Cutler.

Mike "Mish" Shedlock
http://globaleconomicanalysis.blogspot.com

SEO Blog

Sunday Exclusive | Best online games sites for 2013

Posted: 05 Jan 2013 09:59 PM PST

I think everybody in spite of age bar like spending time in activities which give them pleasure and happiness after hectic work schedule at office, home or school/colleges. Online games are one of the best ways for recreation and give your mind and body relaxation. The world of online games...
Read more »

Seth's Blog : What people buy when they buy something on sale

What people buy when they buy something on sale

Assuming it's not something they were shopping for in the first place...

The impulse big-sale buy is not a matter of acquiring a high value item they'll need later at a bargain price today.

No, the consumer is spending money in exchange for the feeling, right now, of saving big. The joy of a bargain. The item is secondary, the feeling is what we just paid for.

You wouldn't know that from the way people selling things act, but that's what we buy.

[Aside: More than a billion people on Earth have never purchased anything on sale at a store. The clearance-sale emotion is a learned one, and a recent one at that.]

• Email to a friend •

luni, 7 ianuarie 2013

Learn About Robots.txt with Interactive Examples

Learn About Robots.txt with Interactive Examples