luni, 7 ianuarie 2013

Learn About Robots.txt with Interactive Examples

Learn About Robots.txt with Interactive Examples


Learn About Robots.txt with Interactive Examples

Posted: 06 Jan 2013 06:43 PM PST

Posted by willcritchlow

One of the things that excites me most about the development of the web is the growth in learning resources. When I went to college in 1998, it was exciting enough to be able to search journals, get access to thousands of dollars-worth of textbooks, and download open source software. These days, technologies like Khan Academy, iTunesU, Treehouse and Codecademy take that to another level.

I've been particularly excited by the possibilities for interactive learning we see coming out of places like Codecademy. It's obviously most suited to learning things that look like programming languages - where computers are naturally good at interpreting the "answer" - which got me thinking about what bits of online marketing look like that.

The kinds of things that computers are designed to interpret in our marketing world are:

  • Search queries - particularly those that look more like programming constructs than natural language queries such as [site:distilled.net -inurl:www]
  • The on-site part of setting up analytics - setting custom variables and events, adding virtual pageviews, modifying e-commerce tracking, and the like
  • Robots.txt syntax and rules
  • HTML constructs like links, meta page information, alt attributes, etc.
  • Skills like Excel formulae that many of us find a critical part of our day-to-day job

I've been gradually building out codecademy-style interactive learning environments for all of these things for DistilledU, our online training platform, but most of them are only available to paying members. I thought it would make a nice start to 2013 to pull one of these modules out from behind the paywall and give it away to the SEOmoz community. I picked the robots.txt one because our in-app feedback is showing that it's one of the ones from which people learned the most.

Also, despite years of experience, I discovered some things I didn't know as I wrote this module (particularly about precedence of different rules and the interaction of wildcards with explicit rules). I'm hoping that it'll be useful to many of you as well - beginners and experts alike.

Interactive guide to Robots.txt

Robots.txt is a plain-text file found in the root of a domain (e.g. www.example.com/robots.txt). It is a widely-acknowledged standard and allows webmasters to control all kinds of automated consumption of their site, not just by search engines.

In addition to reading about the protocol, robots.txt is one of the more accessible areas of SEO since you can access any site's robots.txt. Once you have completed this module, you will find value in making sure you understand the robots.txt files of some large sites (for example Google and Amazon).

For each of the following sections, modify the text in the textareas and see them go green when you get the right answer.

Basic Exclusion

The most common use-case for robots.txt is to block robots from accessing specific pages. The simplest version applies the rule to all robots with a line saying User-agent: *. Subsequent lines contain specific exclusions that work cumulatively, so the code below blocks robots from accessing /secret.html.

Add another rule to block access to /secret2.html in addition to /secret.html.

Exclude Directories

If you end an exclusion directive with a trailing slash ("/") such as Disallow: /private/ then everything within the directory is blocked.

Modify the exclusion rule below to block the folder called secret instead of the page secret.html.

Allow Specific Paths

In addition to disallowing specific paths, the robots.txt syntax allows for allowing specific paths. Note that allowing robot access is the default state, so if there are no rules in a file, all paths are allowed.

The primary use for the Allow: directive is to over-ride more general Disallow: directives. The precedence rule states that "the most specific rule based on the length of the [path] entry will trump the less specific (shorter) rule. The order of precedence for rules with wildcards is undefined.".

We will demonstrate this by modifying the exclusion of the /secret/ folder below with an Allow: rule allowing /secret/not-secret.html. Since this rule is longer, it will take precedence.

Restrict to Specific User Agents

All the directives we have worked with have applied equally to all robots. This is specified by the User-agent: * that begins our commands. By replacing the *, however, we can design rules that only apply to specific named robots.

Replace the * with googlebot in the example below to create a rule that applies only to Google's robot.

Add Multiple Blocks

It is possible to have multiple blocks of commands targeting different sets of robots. The robots.txt example below will allow googlebot to access all files except those in the /secret/ directory and will block all other robots from the whole site. Note that because there is a set of directives aimed explicitly at googlebot, googlebot will entirely ignore the directives aimed at all robots. This means you can't build up your exclusions from a base of common exclusions. If you want to target named robots, each block must specify all its own rules.

Add a second block of directives targeting all robots (User-agent: *) that blocks the whole site (Disallow: /). This will create a robots.txt file that blocks the whole site from all robots except googlebot which can crawl any page except those in the /secret/ folder.

Use More Specific User Agents

There are occasions when you wish to control the behavior of specific crawlers such as Google's Images crawler differently from the main googlebot. In order to enable this in robots.txt, these crawlers will choose to listen to the most specific user-agent string that applies to them. So, for example, if there is a block of instructions for googlebot and one for googlebot-images then the images crawler will obey the latter set of directives. If there is no specific set of instructions for googlebot-images (or any of the other specialist googlebots) they will obey the regular googlebot directives.

Note that a crawler will only ever obey one set of directives - there is no concept of cumulatively applying directives across groups.

Given the following robots.txt, googlebot-images will obey the googlebot directives (in other words will not crawl the /secret/ folder. Modify this so that the instructions for googlebot (and googlebot-news etc.) remain the same but googlebot-images has a specific set of directives meaning that it will not crawl the /secret/ folder or the /copyright/ folder:

Basic Wildcards

Trailing wildcards (designated with *) are ignored so Disallow: /private* is the same as Disallow: /private. Wildcards are useful however for matching multiple kinds of pages at once. The star character (*) matches 0 or more instances of any valid character (including /, ?, etc.).

For example, Disallow: news*.html blocks:

  • news.html
  • news1.html
  • news1234.html
  • newsy.html
  • news1234.html?id=1

But does not block:

  • newshtml note the lack of a "."
  • News.html matches are case sensitive
  • /directory/news.html

Modify the following pattern to block only pages ending .html in the blog directory instead of the whole blog directory:

Block Certain Parameters

One common use-case of wildcards is to block certain parameters. For example, one way of handling faceted navigation is to block combinations of 4 or more facets. One way to do this is to have your system add a parameter to all combinations of 4+ facets such as ?crawl=no. This would mean for example that the URL for 3 facets might be /facet1/facet2/facet3/ but that when a fourth is added, this becomes /facet1/facet2/facet3/facet4/?crawl=no.

The robots rule that blocks this should look for *crawl=no (not *?crawl=no because a query string of ?sort=asc&crawl=no would be valid).

Add a Disallow: rule to the robots.txt below to prevent any pages that contain crawl=no being crawled.

Match Whole Filenames

As we saw with folder exclusions (where a pattern like /private/ would match paths of files contained within that folder such as /private/privatefile.html), by default the patterns we specify in robots.txt are happy to match only a portion of the filename and allow anything to come afterwards even without explicit wildcards.

There are times when we want to be able to enforce a pattern matching an entire filename (with or without wildcards). For example, the following robots.txt looks like it prevents jpg files from being crawled but in fact would also prevent a file named explanation-of-.jpg.html from being crawled because that also matches the pattern.

If you want a pattern to match to the end of the filename then we should end it with a $ sign which signifies "line end". For example, modifying an exclusion from Disallow: /private.html to Disallow: /private.html$ would stop the pattern matching /private.html?sort=asc and hence allow that page to be crawled.

Modify the pattern below to exclude actual .jpg files (i.e. those that end with .jpg).

Add an XML Sitemap

The last line in many robots.txt files is a directive specifying the location of the site's XML sitemap. There are many good reasons for including a sitemap for your site and also for listing it in your robots.txt file. You can read more about XML sitemaps here.

You specify your sitemap's location using a directive of the form Sitemap: <path>.

Add a sitemap directive to the following robots.txt for a sitemap called my-sitemap.xml that can be found at /my-sitemap.xml.

Add a Video Sitemap

In fact, you can add multiple XML sitemaps (each on their own line) using this syntax. Go ahead and modify the robots.txt below to also include a video sitemap called my-video-sitemap.xml that lives at /my-video-sitemap.xml.

What to do if you are stuck on any of these tests

Firstly, there is every chance that I've made a mistake with my JavaScript tests to fail to grade some correct solutions the right way. Sorry if that's the case - I'll try to fix them up if you let me know.

Whether you think you've got the answer right (but the box hasn't gone green) or you are stuck and haven't got a clue how to proceed, please just:

  1. Check the comments to see if anyone else has had the same issue; if not:
  2. Leave a comment saying which test you are trying to complete and what your best guess answer is

This will let me help you out as quickly as possible.

Obligatory disclaimers

Please don't use any of the robots.txt snippets above on your own site - they are illustrative only (and some would be a very bad idea). The idea of this post is to teach the general principles about how robots.txt files are interpreted rather than to explain the best ways of using them. For more of the latter, I recommend the following posts:

I hope that you've found something useful in these exercises whether you're a beginner or a pro. I look forward to hearing your feedback in the comments.


Sign up for The Moz Top 10, a semimonthly mailer updating you on the top ten hottest pieces of SEO news, tips, and rad links uncovered by the Moz team. Think of it as your exclusive digest of stuff you don't have time to hunt down but want to read!

From the Archives: Marshmallow Cannon

The White House Your Daily Snapshot for
Monday, January 7, 2013
 

From the Archives: Marshmallow Cannon

The White House recently crossed 100,000,000 views on its YouTube channel and marked the milestone by looking at some of our most popular videos, including this raw footage of President Obama launching a marshmallow cannon at the White House Science Fair.

Watch the marshmallow cannon launch:

Watch: President Obama launches a marshmallow cannon

In Case You Missed It

Here are some of the top stories from the White House blog:

Weekly Address: Working Together in the New Year to Grow Our Economy and Shrink Our Deficits
In this week’s address, President Obama talks about the bipartisan agreement that Congress reached this week which prevented a middle-class tax hike.

Join President Obama in a National Day of Service
On January 21, 2013, our nation will celebrate Dr. Martin Luther King, Jr. Day (MLK Day), a national holiday during which we honor the legacy of the civil rights leader Dr. King through a day of service and volunteering.

2012: A Year In Photos
The White House photo team has a front row view for all the events — both big and small — that take place at 1600 Pennsylvania Ave., as well as on the road with the President, the Vice President and the First Family.

Today's Schedule

All times are Eastern Standard Time (EST).

10:30 AM: The Presient receives the Presidential Daily Briefing

1:05 PM: The President makes a personnel announcement WhiteHouse.gov/live

2:00 PM: Press Briefing by Press Secretary Jay Carney WhiteHouse.gov/live

WhiteHouse.gov/live Indicates that the event will be live-streamed on WhiteHouse.gov/Live

Get Updates

Sign up for the Daily Snapshot

Stay Connected


This email was sent to e0nstar1.blog@gmail.com
Sign Up for Updates from the White House
Unsubscribe | Privacy Policy
Please do not reply to this email. Contact the White House


The White House • 1600 Pennsylvania Ave NW • Washington, DC 20500 • 202-456-1111
 

Seth's Blog : Two kinds of mistakes

 

Two kinds of mistakes

There is the mistake of overdoing the defense of the status quo, the error of investing too much time and energy in keep things as they are.

And then there is the mistake made while inventing the future, the error of small experiments gone bad.

We are almost never hurt by the second kind of mistake and yet we persist in making the first kind, again and again.



More Recent Articles

[You're getting this note because you subscribed to Seth Godin's blog.]

Don't want to get this email anymore? Click the link below to unsubscribe.




Your requested content delivery powered by FeedBlitz, LLC, 9 Thoreau Way, Sudbury, MA 01776, USA. +1.978.776.9498

 

duminică, 6 ianuarie 2013

Mish's Global Economic Trend Analysis

Mish's Global Economic Trend Analysis


Startling Look at Employment Demographics by Age Group: Spotlight on Age 25-54

Posted: 06 Jan 2013 10:28 PM PST

Last month I posted a chart showing employment by age group. Here is an update as of Friday's job release.

Employment Demographics by Age Group



click on chart for sharper image

Note that 100% of the job growth since the recession is in age group 55 and over.

Last month, someone proposed the above chart was blatantly misleading because it does not reflect the aging workforce.

Let's investigate that hypothesis with a look at actual data (numbers in tables and charts in thousands).

Civilian Institutional Population (CP) and Labor Force (LF)

Year16-19 CP20-24 CP25-54 CP55+ CP16-19 LF20-24 LF25-54 LF55+ LF
200015912183111206565769782711425010139318668
200115929188771216045868379021455710178919485
200215994193481220776015175851478110171920778
200316096198011232896198171701492810230922104
200416222201971234106352771141515410212223011
200516398202761241756523371641512710277324257
200616678202651248846698872811511310356625468
200716982204271256966876170121520510435326554
200817075204091256527065268581517410439627858
200917043205241255657266863901497110374229040
201016901210471252907459159061502810294030014
201116774214231247047671657271527010174430876
201216984217991243148018758231546210125332437

Age Group 25-54 Key Facts

  • In 2007 the civilian population was 125,652,000 
  • In 2007 the labor force was 104,353,000
  • In 2012 the civilian population was 124,314,000
  • In 2012 the labor force was 101,253,000

Numbers are non-adjusted from BLS tables.

Simply put, the decrease in civilian population in age group 25-54 was 1,340,000. The decrease in the labor force was a staggering 3,100,000!

Let's explore this idea in still more detail looking at employment, unemployment, and non-employment.

Spotlight on Age Group 25-54

Year25-54 CP25-54 LF25-54 Employed25-54 Not Employed25-54 Unemployed
200012065610139398292223643102
200112160410178997948236563842
200212207710171996823252544896
200312328910230997178261115131
200412341010212297472259384650
200512417510277398517256584256
200612488410356699672252123894
2007125696104353100450252463904
200812565210439699369262835027
200912556510374295144304218597
201012529010294094082312088858
201112470410174493674310308069
201212431410125394150301647103

Notes

  1. Unemployment is the difference between employment and the labor force. 
  2. Not-employed is the difference between employment and the civilian population. 
  3. Numbers are non-adjusted from BLS tables. 
  4. There may be rounding errors.

More Key Facts For Age Group 25-54

  • Between 2007 and 2012 the civilian population declined by 1,340,000
  • Between 2007 and 2012 the labor force declined by 3,100,000
  • Between 2007 and 2012 employment fell from 100,450,000 to 94,150,000.
  • Between 2007 and 2012 employment declined by 6,300,000 jobs on a mere decrease in the civilian population of 1,340,000!

Let's take a look at the above table in chart form.

Civilian Population, Labor Force, Employed, Not-Employed 



click on chart for sharper image

Irrefutable Evidence Falling Employment Not Based on Boomer Demographics

This plunge in employment in the prime working age group of 25-54 is irrefutable proof that the drop in employment and the falling participation rate is not based on aging boomer demographics.

By calculation, 4,960,000 jobs (6,300,000 - 1,340,000) simply vanished into thin air (in age group 25-54 alone).

Thus, the plunge in employment in the prime working age group of 25-54 also provides strong evidence the stated unemployment rate of 7.8% is bogus by a more sensible measure of unemployment.

Better Measure of Unemployment

I propose this simple definition: If you want a job, are physically able to work a job, and you don't have a job, then you are unemployed.

Actual measures are purposely defined to hide the true state of the economy.

For a close scrutiny of the latest jobs report, please see Establishment Survey +155,000 Jobs; Household Survey +28,000 Jobs; Unemployment Rate Revised Up, Flat Since September

Mike "Mish" Shedlock
http://globaleconomicanalysis.blogspot.com

"Wine Country" Economic Conference Hosted By Mish
Click on Image to Learn More

Obama's Chance to Do Something Right: Nominate Hagel for Secretary of Defense; Why the War Party Fears Hagel

Posted: 06 Jan 2013 05:43 PM PST

The Financial Time reports Hagel nomination expected this week.
US President Barack Obama is poised to nominate Chuck Hagel as secretary of defence, setting the stage for a tough nomination fight focusing on the former Republican senator's views on Israel and Iran.

The announcement by Mr Obama of a new Pentagon chief to replace Leon Panetta could come as early as Monday, administration officials indicated. Mr Obama returned from a holiday in Hawaii on Sunday.

Mr Hagel's possible nomination has caused an uproar among neoconservatives over his questioning of sanctions and military action against Iran and his statement that a "Jewish lobby" intimidates Congress.

Many Democrats have been unenthusiastic as well, because he is a Republican and over a past statement criticising a Clinton-era diplomatic appointment as "openly, aggressively gay".

But the criticism has been especially virulent from the right, with Israel conservatives labelling him borderline anti-Semitic and suggesting he was intent in making dangerously deep cuts to the defence budget.

Lindsey Graham, a Republican senator from South Carolina and a prominent defence hawk, said on Sunday he was inclined not to support his former Senate colleague because of his "antagonistic" attitude to Israel.

"This is an in-your-face nomination by the president for all those who are supportive of Israel," Mr Graham told CNN.
Got That?

Democrats don't want Hagel simply because Hagel is a Republican. The Republicans do not want him because he is not a war-monger.

That's what this whole thing boils down to.

Why the War Party Fears Hagel

Let's fill in the details with a look at Why the War Party Fears Hagel
Who is Chuck Hagel?

Born in North Platte, Neb., he was a squad leader in Vietnam, twice wounded, who came home to work in Ronald Reagan's 1980 campaign, was twice elected U.S. senator, and is chairman of the Atlantic Council and co-chair of the President's Foreign Intelligence Advisory Board.

To The Weekly Standard's Bill Kristol, however, Hagel is a man "out on the fringes," who has a decade-long record of "hostility to Israel" and is "pro-appeasement-of-Iran."

Hagel's enemies contend that his own words disqualify him.

First, he told author Aaron David Miller that the "Jewish lobby intimidates a lot of people up there" on the Hill. Second, he urged us to talk to Hamas, Hezbollah, Iran. Third, Hagel said several years ago, "A military strike against Iran ... is not a viable, feasible, responsible option."

Hagel has conceded he misspoke in using the phrase "Jewish lobby." But as for a pro-Israel lobby, its existence is the subject of books and countless articles. When AIPAC sends up to the Hill one of its scripted pro-Israel resolutions, it is whistled through. Hagel's problem: He did not treat these sacred texts with sufficient reverence.

"I am a United States senator, not an Israeli senator," he told Miller. "I support Israel. But my first interest is I take an oath ... to the Constitution of the United States. Not to a president. Not to a party. Not to Israel. If I go run for Senate in Israel, I'll do that."

Hagel puts U.S. national interests first. And sometimes those interests clash with the policies of the Israeli government.
Chuck Hagel allies launch counter-attack

Politico reports Chuck Hagel allies launch counter-attack.
Brent Scowcroft, who was national security adviser to Presidents Gerald Ford and George H.W. Bush, said Hagel "has a very broad view of American foreign policy and the role in the world. He is very judicious, and he has an outstanding record as a senator, which gives him the knowledge and background to understand about the sometimes fractious relationship between the Congress, especially the Senate, and the administration."

"He got two Purple Hearts on the front lines," Scowcroft added. "That's about the best recommendation you can get from somebody whose job would be to advise on the use of troops around the world. I am honestly surprised, even astonished, at the attacks. I do know where they're coming from, but I don't understand the genesis of them.

Sen. Jack Reed (D-R.I.), a former Army Ranger who serves on the Armed Services Committee and has traveled to war zones with Hagel, said: "Every man and woman in uniform in the Pentagon and across the world will know that he's not only talked the talk, he's walked the walk. … He also has a successful business record. He is an entrepreneur who's succeeded.

Zbigniew Brzezinski, national security adviser to President Jimmy Carter, forcefully defended Hagel on MSNBC's "Morning Joe": "Unlike some of his critics, … he has fought for his country. He has been wounded for this country. He is a man who knows what war is like."
Nonviable Options

I support anyone willing to make this statement "A military strike against Iran ... is not a viable, feasible, responsible option." vs. anyone not willing to make the same statement.

A military strike on Iran would be idiotic, and I have no doubt one would have happened had Romney been elected.

It remains to be see if Obama can get this right. However,  Hagel as Secretary of Defense would be a step in the right direction.

Mike "Mish" Shedlock
http://globaleconomicanalysis.blogspot.com

The Bears' Prayer

Posted: 06 Jan 2013 09:58 AM PST

Given the Chicago Bears were put out of their misery last week, missing the playoffs I offer this prayer for 2013

Our Papabear
Who art in heaven
Hallas be thy name
We're havin' no fun
Cause we sure play dumb
At home as we do away
Give us this season
Coach Lovie's Leavin'
And forgive us you must
As we forgive those, scoring big against us
And lead us not into the playoffs
But deliver us from Cutler
Amen

I wrote that in 1997 and only needed to change a few words. Back then it was

Give us this season
Dave Wanstedt's Leavin'

In 1997 I ended with "deliver us from the Packers"

Since Lovie is gone, part of the prayer has been realized already. Fans still await much needed delivery from Cutler.

Mike "Mish" Shedlock
http://globaleconomicanalysis.blogspot.com

SEO Blog

SEO Blog


Sunday Exclusive | Best online games sites for 2013

Posted: 05 Jan 2013 09:59 PM PST

I think everybody in spite of age bar like spending time in activities which give them pleasure and happiness after hectic work schedule at office, home or school/colleges. Online games are one of the best ways for recreation and give your mind and body relaxation. The world of online games...
Read more »

Seth's Blog : What people buy when they buy something on sale

 

What people buy when they buy something on sale

Assuming it's not something they were shopping for in the first place...

The impulse big-sale buy is not a matter of acquiring a high value item they'll need later at a bargain price today.

No, the consumer is spending money in exchange for the feeling, right now, of saving big. The joy of a bargain. The item is secondary, the feeling is what we just paid for.

You wouldn't know that from the way people selling things act, but that's what we buy.

[Aside: More than a billion people on Earth have never purchased anything on sale at a store. The clearance-sale emotion is a learned one, and a recent one at that.]



More Recent Articles

[You're getting this note because you subscribed to Seth Godin's blog.]

Don't want to get this email anymore? Click the link below to unsubscribe.




Your requested content delivery powered by FeedBlitz, LLC, 9 Thoreau Way, Sudbury, MA 01776, USA. +1.978.776.9498