marți, 8 ianuarie 2013

Damn Cool Pics

Damn Cool Pics


Fastest Way to Drink Water [Video]

Posted: 08 Jan 2013 03:55 PM PST


There are some things in life you just have to see to believe.

This kid drinking a bottle of water is one of them.

Watch as he almost-magically sucks down an entire bottle of water in less than one second.

Some are saying that this video fake and that the guy filming it switches the bottle out with an empty one at the 0:05 mark. What do YOU think? Real or fake?



Sexy Fitness Girls

Posted: 08 Jan 2013 03:49 PM PST

There is nothing like a rock hard female body.









































































































































Protect Your Fancy Car Radio With a Fake One

Posted: 08 Jan 2013 11:38 AM PST

Who would ever take the time and risk to steal such an old piece of junk?





















Animals Animation Vs. Real Life

Posted: 08 Jan 2013 11:19 AM PST

Animals in cartoons and in the real life.









































What Does the Car of the Future Look Like? [Infographic]

Posted: 08 Jan 2013 10:41 AM PST

Self-driving cars sound like something out of a science fiction movie. But they're part of our roadway reality now. Several automakers are testing them, and the Institute of Electrical and Electronics Engineers (IEEE) predicts 75 percent of cars on the road in 2040 will be self-driving. Here, InsuranceQuotes.com peeks into the future of self-driving cars.

Click on Image to Enlarge.
From: InsuranceQuotes.com

SEO Blog

SEO Blog


Local SEO Marketing Strategies for 2013

Posted: 08 Jan 2013 03:52 AM PST

Local Marketing is the implementation of Market Mix for a particular Region or at a Domestic level. Market Mix elements constitute of four key elements i.e. Product, Price, Place & Promotion. These four P's make the Market effective to achieve great Exposure. To target local online market the local seo...
Read more »

Visualizing Duplicate Web Pages

Visualizing Duplicate Web Pages


Visualizing Duplicate Web Pages

Posted: 07 Jan 2013 06:12 PM PST

Posted by David Barts

We've just changed the way we detect duplicate or near-duplicate web pages in our custom crawler to better serve you. Our previous code produced good results, but it could fall apart on large crawls (ones larger than about 85,000 pages), and takes an excessively long time (sometimes on the order of weeks) to finish.
 
Now that the change is live, you’ll see some great improvements and a few changes:
 
  • Results will come in faster (up to an hour faster on small crawls and literally days faster on larger crawls)
  • More accurate duplicate removal, resulting in fewer duplicates in your crawl results

This post provides a look into the motivations behind our decision to change the way our custom crawl detects duplicate and near-duplicate web pages at a high level. Enjoy!

Improving our page similarity measurement

The heuristic we currently use to measure the similarity between two pages is called fingerprints. Fingerprints relies on turning each page into a vector of 128 64-bit integers in such a way that duplicate or near-duplicate pages result in an identical, or nearly identical, vector. The difference between a pair of pages is proportional to the number of corresponding entries in the two vectors which are not the same.
 
The faster heuristic we are working on implementing is called a similarity hash, or simhash for short. A simhash is a single, 64-bit, unsigned integer, again calculated in such a way that duplicate or near-duplicate pages result in simhash values which are identical, or nearly so. The difference between pages is proportional to the number of bits that differ in the two numbers.
 

The problem: avoid false duplicates

The problem is that these two measures are very different: one is a vector of 128 values, while the other is a single value. Because of this difference, the measurements may vary in how they see page difference. With the possibility of a single crawl containing over a million pages, that's an awful lot of numbers we need to compare to determine the best possible threshold value for the new heuristic.
 
Specifically, we need to set the heuristic threshold to detect as many duplicates and near-duplicates as possible, while minimizing the number of false duplicates. It is more important to absolutely minimize the number of page pairs which aren’t duplicates, so we’re not removing a page as a duplicate unless it actually *is* a duplicate. This means we need to be able to detect pages where:
  • The two pages are not actually duplicates or near-duplicates,
  • The current fingerprints heuristic correctly views them as different, but
  • The simhash heuristic incorrectly views them as similar.
We’re being incredibly careful about this to avoid the most negative customer experience we anticipate: having a behind-the-scenes change of our duplicate detection heuristic causing a sudden rash of incorrect "duplicate page" errors to appear for no apparent good reason.
 

The solution: visualizing the data

Our need to make a decision where many numeric quantities are involved is a classic case where data visualization can be of help. Our SEOmoz data scientist, Matt Peters, suggested that the best way to normalize these two very different measures of page content was to focus on how they measured difference between existing pages. Taking that to heart, I decided on the following approach:
  1. Sample about 10 million pairs of pages from about 25 crawls selected at random.
  2. For each pair of pages sampled, plot their difference as measured by the legacy fingerprints heuristic on the horizontal axis (0 to 128), and their difference as measured by simhash on the vertical axis (0 to 64).
The plot resulting from this approach looks like this:
 
Visualized data
 
Immediately, a problem is obvious: there's no measure of central tendency (or lack thereof) in this image. If more than one page pair has the same difference as measured by both legacy fingerprints and simhash, the plotting software will simply place a second red dot precisely atop the first one. And so on for the third, fourth, hundredth, and possibly thousandth identical data point.
 
One way to address this problem is to color the dots differently depending on how many page pairs they represent. So what happens if we select the color using a light wavelength that corresponds to the number of times we draw a point on the same spot? This tactic gives us a plot with red (a long wavelength) indicating the most data points, down through orange, yellow, green, blue, and violet (really, magenta on this scale) representing only one or two values:
 
Linear data visualization
 
How disappointing! That's almost no change at all. However, if you look carefully, you can see a few blue dots in that sea of magenta, and most important of all, the lower-leftmost dot is red, representing the highest number of instances of all. What's happening here is that red dot represents a count so much higher than all the other counts that most of the other colors between it and the ones representing the lowest numbers end up unused.
 
The solution is to assign colors in such a way that most of the colors end up being used for coding the lower counts, and to assign progressively fewer colors as counts increase. Or, in mathematical terms, to assign colors based on a logarithmic scale rather than a linear one. If we do that, we end up with the following:
 
Logarithmic data visualization
 
Now we're getting somewhere! As expected, there is a central tendency in the data, even though it's pretty broad. One thing that's immediately evident is that, although in theory, the difference measured by simhash can go to a maximum of 64, in practice, it rarely gets much higher than 46 (three-fourths of the maximum). In contrast, using the fingerprints difference, many pages reach the maximum possible difference of 128 (witness all the red and orange dots along the right side of the graphic). Keep in mind that those red and orange dots represent really big counts, because the color scale is logarithmic.
 
Where we have to be most careful is on the bottom edge of things. That represents simhash values which indicate pairs of pages that are quite similar. If two pages are not, in fact, similar, yet simhash measures them similar where fingerprints saw a significant difference, this is precisely the sort of negative customer experience we are trying to avoid. One potential trouble spot is circled below:
 
Pesky data visualization
 
The circled dot represents a pair of pages which are actually quite different, yet which simhash thinks are quite similar. (The dot to the left and even further below turns out to not be a problem: it represents a pair of nearly duplicate pages that the old heuristic missed!)
 
The vertical position of the troublesome dot represents a simhash difference of 6 (6 corresponding bits in the two 64-bit simhash values differ). It's not the only case, either: occasionally, such pairs of pages come up from time to time. It happens in 1% or less of the crawls, but it does happen. If we choose a simhash difference threshold of 6 (matching the threshold we currently have defined for the legacy fingerprints), there will be false positives.
 

Picking a threshold

Thankfully, 6 seems to be a border case. Above 6 bits of difference, the chance of a false positive increases. Below 6, I was unable to find any such pathological cases, and I examined thousands of crawls trying to find one. So I chose a difference threshold of 5 for simhash-based duplicate detection. That results in a situation represented by the final graphic:
 
Bounded Logarithmic data visualization
 
Here we have lines drawn to represent the two difference thresholds. Everything to the left of the vertical line represents what the current code would report as duplicate. Everything below the horizontal line represents what the simhash code will report. Keeping in mind the logarithmic color scale and the red dot in the lower-left corner, we see that the number of page pairs where the two heuristics agree about similarity outweighs the number of page pairs where they disagree.
 
Note that there are still things in the "false positive" (lower right) quadrant. It turns out that those pairs tend not to differ much from the pairs where the two measures agree, or, for that matter, from the false negative pairs in the upper left quadrant. In other words, with the chosen thresholds, both simhash and the legacy fingerprints miss seeing some true near-duplicates.
 

The visible results

With this threshold decision, the number of false negatives outnumbers the number of false positives. This meets our goal of minimizing the number of false positives, even at the cost of incurring false negatives. Note that the "false positives" in the lower-right quadrant are actually quite similar to each other, and therefore would more accurately be described as the false negatives of the legacy fingerprints heuristic, rather than the false positives of the fingerprints heuristic.
 
The most visible aspects of the change to customers are two-fold:
 
1. Fewer duplicate page errors: a general decrease in the number of reported duplicate page errors. However, it bears pointing out that:
  • We may still miss some near-duplicates. Like the current heuristic, only a subset of the near-duplicate pages is reported.
  • Completely identical pages will still be reported. Two pages that are completely identical will have the same simhash value, and thus a difference of zero as measured by the simhash heuristic. So, all completely identical pages will still be reported.
2. Speed, speed, speed: The simhash heuristic detects duplicates and near-duplicates approximately 30 times faster than the legacy fingerprints code. This means that soon, no crawl will spend more than a day working its way through post-crawl processing, which will facilitate significantly faster delivery of results for large crawls.
 
I hope this post provides some meaningful insight into our upcoming changes. I look forward to hearing your thoughts in the comments below.

Sign up for The Moz Top 10, a semimonthly mailer updating you on the top ten hottest pieces of SEO news, tips, and rad links uncovered by the Moz team. Think of it as your exclusive digest of stuff you don't have time to hunt down but want to read!

New Nominations for the National Security Team

The White House Your Daily Snapshot for
Tuesday, January 8, 2013
 

New Nominations for the National Security Team

Speaking from the East Room of the White House yesterday, President Obama announced two key nominations for his national slecurity team. He tapped John Brennan to serve as the Director of the Central Intelligence Agency, and he asked Sen. Chuck Hagel to serve as Secretary of Defense.

Learn about John Brennan, President Obama's pick for Director of the Central Intelligence Agency.

Learn about Sen. Chuck Hagel, President Obama's choice for Secretary of Defense.

President Barack Obama shakes hands with former Sen. Chuck Hagel in the East Room of the White House, Jan. 7, 2013. The President nominated Sen. Hagel for Secretary of Defense and John Brennan, Assistant to the President for Homeland Security and Counterterrorism, second from left, for Director of the CIA. Also pictured on stage are acting CIA Director Michael Morrell, left, and Secretary of Defense Leon Panetta, right. (Official White House Photo by Chuck Kennedy)

President Barack Obama shakes hands with former Sen. Chuck Hagel in the East Room of the White House, Jan. 7, 2013. The President nominated Sen. Hagel for Secretary of Defense and John Brennan, Assistant to the President for Homeland Security and Counterterrorism, second from left, for Director of the CIA. Also pictured on stage are acting CIA Director Michael Morrell, left, and Secretary of Defense Leon Panetta, right. (Official White House Photo by Chuck Kennedy)

In Case You Missed It

Here are some of the top stories from the White House blog:

Hagel for Secretary of Defense
Valerie Jarrett, Senior Advisor to President Obama, discusses Chuck Hagel, the President's choice for Secretary of Defense.

A Whole-of-Government Commitment to Inclusive Entrepreneurial Growth
The Obama Administration recently released a detailed action plan to achieve the goal of increasing Federal services to entrepreneurs and small businesses, with an emphasis on startups and growing firms and underserved markets.

Weekly Address: Working Together in the New Year to Grow Our Economy and Shrink Our Deficits
In this week’s address, President Obama talks about the bipartisan agreement that Congress reached last week which prevented a middle-class tax hike.

Today's Schedule

All times are Eastern Standard Time (EST).

10:30 AM: The President receives the Presidential Daily Briefing

12:30 PM: Press Briefing by Press Secretary Jay Carney WhiteHouse.gov/live

2:00 PM: The President meets with Secretary of Defense Panetta

WhiteHouse.gov/live Indicates that the event will be live-streamed on WhiteHouse.gov/Live

Get Updates

Sign up for the Daily Snapshot

Stay Connected


This email was sent to e0nstar1.blog@gmail.com
Sign Up for Updates from the White House
Unsubscribe | Privacy Policy
Please do not reply to this email. Contact the White House


The White House • 1600 Pennsylvania Ave NW • Washington, DC 20500 • 202-456-1111
 

Seth's Blog : Toward resilience in communication (the end of cc)

 

Toward resilience in communication (the end of cc)

If you saw this post tweeted in your twitter stream, odds are you didn't click on it. And if you've got an aggressive spam filter, it's likely that many people who have sent you email are discovering you didn't receive it. "Did you see the tweet?" or "did you get my email?" are a tax on our attention. Resilience means standing up in all conditions, but in fact, electronic communication has gotten more fragile, not less.

We wait, hesitating, unsure who has received what and what needs to be resent. With this error rate comes an uncertainty where we used to have none (we're certain of the transmission if you're actively talking on the phone with us and we know if you got that certified mail.) It's now hard to imagine the long cc email list as an idea choice for getting much done.

The last ten years have seen an explosion in asynchronous, broadcast messaging. Asynchronous, because unlike a phone call, the sender and the recipient aren't necessarily interacting in real time. And broadcast, because most of the messaging that's growing in volume is about one person reaching many, not about the intimacy of one to one. That makes sense, since the internet is at its best with low-resolution mass connection.

It's like throwing a thousand bottles into the ocean and waiting to see who gets your message.

Amazon, eBay, Twitter, blogs, Pinterest, Facebook--they are all tools designed to make it easier to reach more and more people with a variation of faux intimacy. And this broadcast approach means that communication breaks down all the time... we have mass, but we've lost resiliency.

Asynchronous creates two problems when it comes to resiliency. First, it's difficult to move the conversation forward because the initiator can't be sure when to report back in with an update. Second, if some of the data changes in between interactions, it's entirely likely that the conversation will go off the rails. If you send two colleagues a word processed doc and, while you're waiting for a response, the file changes, it's entirely possible that you'll get feedback on the wrong file. Source control for any conversation of more than two people becomes a huge issue.

Your boss initiates a digital thread about an upcoming meeting. While two of the people are busy working on the agenda, a third ends up cancelling the meeting, wasting tons of effort because people are out of sync.

But asynchronous communication is also a boon. It means that you don't have to drop everything to get on a call or go to a meeting. Without the ability to spread out our project communication, we'd get a lot less done.

So, here we are in the middle of the communication age, and we're actually creating a system that's less engaging, less resilient to change or dropped signals, and less likely to ensure that small teams are actually contributing efficiently.  The internet funding structure rewards systems that get big, not always systems that work very well.

A simple trade-off has to be made: You can't simultaneously have a wide, open system for communication and also have tight connections and resilience. Open and wide might work great for promoting your restaurant on Twitter, but it's no way to ensure tight collaboration among the three or four investors who need to coordinate your new menu. 

As digital teamwork gets more important, then, team leaders are going to have to figure out how to build resiliency into the way they work. That might include something as simple as affirmative checkins, or more technical solutions to be sure everyone is in sync and also being heard. Someone sitting on a conference call and doing nothing but pretending to listen benefits no one.

Friends and family at Dispatch have built one approach to this problem, a free online collaboration tool that uses the cloud to create a threaded conversation built around online files, with redundancy and a conversation audit trail as part of the process. When someone speaks up, everyone can track it. When a file changes, everyone sees it. And only the invited participate.

It won't be the last tool you'll find that will address an increasingly urgent problem for teams that want to get things done, but it's worth some effort to figure this out. Tightly-knit, coordinated teams of motivated, smart people can change the world. It's a shame to miss that opportunity because your tools are lousy.



More Recent Articles

[You're getting this note because you subscribed to Seth Godin's blog.]

Don't want to get this email anymore? Click the link below to unsubscribe.




Your requested content delivery powered by FeedBlitz, LLC, 9 Thoreau Way, Sudbury, MA 01776, USA. +1.978.776.9498