marți, 9 aprilie 2013

How Do You Know If Your Data is Accurate? A case study using search volume, CTR, and rankings

How Do You Know If Your Data is Accurate? A case study using search volume, CTR, and rankings


How Do You Know If Your Data is Accurate? A case study using search volume, CTR, and rankings

Posted: 08 Apr 2013 07:18 PM PDT

Posted by Matt Peters

Big Data and analytics has been called the "next big thing," and it can certainly make a strong case with the explosion of easily accessible, high-quality data available today. In the inbound marketing world, we have access to backlinks and anchor text, traffic and click stream data, search volume and click through rate (CTR), social media metrics, and many more. There is huge value in this data, if we can unlock it.

But, there's a problem:  real world data is messy, and processing it can be tricky. How do we know if our data is accurate, or if we can trust our final conclusions? If we want to use this data to find a better way to do marketing, we have to be careful about accuracy.

There are no hard and fast rules when it comes to data analysis. There are some best practices, but even these can get a little murky. The most important thing to do is to put on your detective cap and dive into the data. The more familiar you are with the data, the easier it is to spot something that seems strange. More than likely, your findings will be quality issues that need to be improved.

Throughout this post, we will use a data set from Google Webmaster Tools of keyword search referrals as a case study. Here's a snippet of the data:

We also put all of our keyword analysis code on Github so you can run our analysis on your own site's data.

The rest of this post discusses six best practices and suggestions for ensuring your data and results are accurate. Enjoy!


1. Separate data from analysis, and make analysis repeatable

It is best practice to separate the data and the process that analyzes the data. This also makes it possible to repeat the analysis on different data, either by you or by someone else. For this reason, most data scientists don't use Excel since it couples the data with analysis and makes it difficult to repeat. Instead, they often use a high-level statistical oriented scripting language, like R, Matlab/Octave, SAS, or a general-purpose language like Python.

At Moz, the data science team uses Python. Our Big Data team also uses it heavily, which makes it easy to integrate our algorithms with their production code.

2. If possible, check your data against another source

In many cases this step may be impossible, but if you can, it's the best way to make sure you data is accurate. In Moz's case, we were able to check the Google Webmaster Tools data against data from Google Analytics.

Some pieces to focus on when you're comparing data include total aggregate counts, counts in sub-categories, or averages. In our case, we checked both the total search visits and spot check the number of visits for a few different keywords.

3. Get down and dirty with the data

This is the fun part where we get to play with the data and do some exploratory data analysis. A good place to start is by looking at the raw data to see what jumps out. In the case of the Google Webmaster Tools data, I noticed that they don't always give the search volume in long-tail cases with only a few searches. Instead, the data has "<10" or "-" instead of numbers that will need to be handled carefully since they will result in missing values.

This is also the time to put on your detective cap and start asking questions about the data. We looked at some keywords like "seomoz" and "page authority" that are branded, and some like "author rank" and "schema testing tool" that are not. After checking out the data, I asked myself, "Hmmm, I wonder if there is any difference in Click through rate between branded and non-branded keywords, or in average search position?"

Usually by this point I'm amped to start answering hard questions, but I try to resist the temptation to jump off the deep end until I run a few more sanity checks. Univariate analysis is a great tool to help you check yourself before going too far, especially since most software packages provide an easy way to do it and it often produces the first interesting results. The idea is to get a picture of what each variable "looks like" by plotting a histogram and calculating things like the mean.

The above chart shows an example of univariate analysis on our data. In each panel, we have plotted the distribution of one of the four variables in our data: Impressions, Average Position, Clicks, and CTR. We also included the mean of each distribution in the title. Immediately, we can see a few interesting comparisons. 

First, almost all of our keywords are "long-tail" with less then 100 searches/month. However, much of our traffic is also made up from a few high-volume keywords (>1000 searches/month). The average position is concentrated in the top 10 as expected (since results off the first page send very little traffic). This is also good check of our data. If we had seen a significant amount of keywords sending traffic at ranks lower then #10, we should investigate further. Finally, the CTR in the lower right is interesting. Most of the keywords have CTR less then 40%, but we do have a few high volume keywords with much higher CTR.

By now, I usually feel pretty comfortable with the data and can jump in. At this point, I've found that asking specific questions is often the most productive way to answer bigger questions, but everyone works differently, so you'll need to find what works best for you. In the case of the Google Webmaster Tools data, I'm curious about the impact of branded vs non-branded keywords.

One way to examine this is to segment the data and then repeat the univariate analysis for each segment. Here's the plot for impressions:

We can see that, overall, branded keywords have a higher search volume then non-branded words (means of 380 and 160, respectively). It gets more interesting if we look at average position and CTR:

We see a huge difference in Average Position and CTR between the branded and non-branded words. Most of our traffic from branded words is in the top two or three positions, with non-branded queries sending traffic throughout the top 10. The CTR is also significantly different with a few branded keywords having very high CTR (60%+).

We might also wonder about how the CTR changes with the search position. We expect that lower-ranking keywords will have a lower CTR. Can we see this in the data?

Indeed, the CTR drops off rapidly after the top five. There is an interesting bump up at position 15, but this is a data sparse region so this may not be a real signal.

4. Unit test your code (where it makes sense)

This is a software development best practice, but can get a little sticky in the data science world and often requires judgement on your part. Unit testing everything is a great way to catch many problems, but it will really slow you down. It's a good idea to use unit test code that you think will be used again, has a general purpose outside the specific project, or has complicated enough logic that it would be easy to get wrong. It's often not worthwhile to test code quickly written to check an idea.

In the case of the Google Webmaster Tools data, we decided to test the process that reads the data and fills missing values because the logic is somewhat complicated, but didn't test our code to generate the plots since it was relatively simple. We used a small, synthetic data set to write the tests since it is easy to manage. Check out some of our tests here.

5. Document your process

This step can be annoying, but you will thank yourself a few months later when you need to revisit it. Documentation also communicates your thoughts to others who can check and validate your logic.

In our case, this blog post documents our process, and we provide some additional documentation in the README in the code.

6. Get feedback from others

Peer review is one of the cornerstones of the academic world, and other people's insight is almost always beneficial to improving your analysis. Don't hesitate to ask your team for feedback; most of the time, they'll be happy to give it! 


Do you have any other helpful testing tips? What has worked for you and your team? I'd love to hear your thoughts in the comments below!


Sign up for The Moz Top 10, a semimonthly mailer updating you on the top ten hottest pieces of SEO news, tips, and rad links uncovered by the Moz team. Think of it as your exclusive digest of stuff you don't have time to hunt down but want to read!

Victims of Gun Violence Deserve a Vote

The White House Your Daily Snapshot for
Tuesday, April 9, 2013
 

Victims of Gun Violence Deserve a Vote

Yesterday President Obama traveled to Connecticut, where he spoke with families of children and teachers of the tragedy at Sandy Hook Elementary. He reiterated that we have not forgotten our promise to help prevent future tragedies and reduce gun violence in our country -- and that now more than ever we need to act:

"Now is the time to get engaged. Now is the time to get involved. Now is the time to push back on fear, and frustration, and misinformation. Now is the time for everybody to make their voices heard from every state house to the corridors of Congress."

See more from President Obama's visit, and the common-sense plan to reduce gun violence.

President Obama Asks Americans to Stand Up and Call for Action to Reduce Gun Violence

President Barack Obama and Jillian Soto exit Air Force One at Joint Base Andrews, Md., April 8, 2013. Soto is the sister of Victoria Soto, a first-grade teacher who was killed during the Sandy Hook Elementary School shootings. (Official White House Photo by Pete Souza)

In Case You Missed It

Here are some of the top stories from the White House blog:

Recognizing Sexual Assault Awareness Month
April is National Sexual Assault Awareness and Prevention Month and Americans are urged to support survivors to continue the progress towards addressing sexual assault.

Weekly Address: The President’s Plan to Create Jobs and Cut the Deficit
President Obama tells the American people about the budget he is sending to Congress, which makes the tough choices required to grow our economy and shrink our deficits.

Weekly Wrap Up: “We Have Not Forgotten”
Here's what happened last week on WhiteHouse.gov.

Today's Schedule

All times are Eastern Daylight Time (EDT).

10:30 AM: The President and the Vice President receive the Presidential Daily Briefing

11:30 AM: Press Briefing by Press Secretary Jay Carney WhiteHouse.gov/live

2:00 PM: The Vice President and Attorney General Eric Holder deliver remarks at the White House

3:00 PM: The President and the Vice President meet with Secretary of Defense Hagel

7:30 PM: The President and the First Lady host a concert celebrating Memphis Soul music as part of their "In Performance at the White House" series

WhiteHouse.gov/live Indicates that the event will be live-streamed on WhiteHouse.gov/Live

Get Updates

Sign up for the Daily Snapshot

Stay Connected

 

This email was sent to e0nstar1.blog@gmail.com
Sign Up for Updates from the White House
Unsubscribe | Privacy Policy
Please do not reply to this email. Contact the White House

The White House • 1600 Pennsylvania Ave NW • Washington, DC 20500 • 202-456-1111

 


How to approach journalists for your SEO

How to approach journalists for your SEO

Link to SEOptimise » blog

How to approach journalists for your SEO

Posted: 08 Apr 2013 06:13 AM PDT

A link from a high-authority news website can do wonders for SEO. But do you know how to appeal to journos?

The boundaries between SEO and PR are becoming increasingly blurred. In fact, for many small businesses, any PR work is carried out by their SEO team or agency.

Links from top news websites and popular bloggers are incredibly valuable online. But do you know how to get positive press mentions?

We've been speaking to a journalist who regularly writes for high-authority news websites, including Yahoo! UK and the MailOnline. What are her tips for getting mentions – and even links – in the press?

Attracting press attention

Pitch actual news stories

No journalist wants to give you free publicity – advertorial doesn't sell papers or mean website clicks. So you need to pitch actual stories, such as research you've conducted or analysis of interesting sales figures, not just 'news' of your 'great new product'.

"I get between 10 and 20 press releases in my inbox every day," our insider explains. "About half of those are simply non-stories. They contain 'news' that shouldn't go beyond the company newsletter and certainly won't be interesting to journalists."

Her advice is to look for an interesting angle to the story you want to publicise. For example, if you make chocolate and sales have increased, that's not a particularly interesting story. But if you can show that sales rose following a bad news story, then that's far more interesting – 'Brits turn to chocolate to beat economy blues'.

Offer proof

Your press releases or pitches can't just be 'we think x'. Journalists need some supporting figures, even if they're just to prove to their editors that the story has legs.

"It's not as if we need a massive report," says our journalist, "just something to support the story. Sometimes that can be sales figures, like a surge in sales for a particular product, but it also might be a case study. For example, a popular website that sells second-hand music gave me a case study about a man who'd made £4,000 for Christmas selling off his old vinyl. They had a hugely positive mention and I had a great festive finance story."

Don't underestimate the value of your corporate data and market trends – these can support interesting stories and gain you valuable press attention.

Build press relationships

Don't bombard journalists with press releases if they've never heard of you. Look for writers who have an interest in your field and woo them online. Follow them on Twitter, retweet their stories, and engage with them whenever you can.

If the company budget allows it, take a few high-value journalists out for a drink or a meal. They'll be far more likely to listen to your pitches if they know you.

Our insider adds: "I often get emailed press releases where the company hasn't even bothered to spell my name correctly. It makes me much less likely to read their release."

Pick the right time to approach them

Unless your news is time sensitive, think carefully about the best time to contact a journalist. Last thing on a Friday afternoon isn't a great idea, for example.

Look at when the journalist most commonly publishes so you can avoid ringing them too close to a deadline.

Our insider says: "I remember working in a busy newsroom during a really important speech from the PM. Everyone in the country was watching it but some pain of a PR was phoning around every journalist in the room in turn. None of us ran his story."

Call the right writer

Is your story best suited to a tech journalist? A lifestyle writer? A fashion specialist? Don't waste your time or theirs by contacting someone who can't carry your story. Look up previous articles and coverage if you're not sure who's best.

Receiving press attention

It's impossible to predict when the press might be interested in you. You might be a small company importing plant food when suddenly Radio 4 wants to talk to someone about teens accessing dangerous chemicals from abroad.

Having a popular Twitter and blogging presence will really increase the likelihood that they choose you; if you already look like an industry authority then you add authority to a news report.

If a journalist contacts you, here are some points to remember.

Respond fast

We asked our journalist insider what the single most important thing to remember is when dealing with the press. Her answer? "Act fast!"

She explains: "Most journalists, and even bloggers, are working to incredibly tight deadlines. They are probably ringing around a number of businesses like yours, meaning the first one to reply gets the mention.

"So often I have companies call back or tweet me a week or more after I contacted them. By that point the article has been published and I'm working on something else. If I remember the company at all, it will be to make sure I don't waste time on them again."

Don't be too suspicious

Whenever a survey of the least-trusted professions is carried out, journalists rank about as well as politicians. As a nation, we don't trust them. But you need them for SEO-friendly PR!

Our journalist insider says: "I was recently writing a good news story about businesses that were thriving despite the recession. Yet a number of the companies I phoned were really suspicious of my motives – one had a receptionist who was downright hostile. Because of that one member of staff, they missed out on some really positive publicity."

Be cautious, yes. Consider your words carefully and ask if you can reply by email if you want more time to think your comments through. Make sure you understand exactly what story they are after before you agree to help.

But don't assume that all journalists want to make you look bad. Most often they just need expert comment or a case study.

Don't ask for too much

Of course you want a link or a product mention, but the journalist only cares about the story. Ask for too much and they might just go elsewhere.

"I understand that you want a link and that it's probably the only reason you're talking to me," agrees our insider. "But sometimes my editor won't allow it – but the brand mention is still good publicity and worth having.

"In the past, I've ditched case studies because the company has demanded the right to edit my article before it's published, or insisted that I include a positive mention of a specific product. I'm not some sort of freebie copywriter and only my editor gets to amend my articles."

Of course, if you have a really valuable or unique story then it's worth offering it first to the publication that will give you the most. Just be realistic about what your story is worth.

Respect bloggers

Most smaller websites are unlikely to get a huge amount of press attention from major news sites. Quite often, journalists will simply run an internet search for the kind of spokesperson or case study they need, meaning those that rank highest for the most competitive keywords are likely to get the mention.

However, bloggers may have more time and a wider industry network, meaning you're more likely to be approached by them, or to succeed in pitching a story.

Because they are more specialist a link from a blogger's website can often be extraordinarily useful in terms of SEO. So it's essential to respect bloggers and be as polite, timely, and helpful as you would to a journalist.

Have you dealt with the press? Did you get a link? Is all publicity good publicity? Share your thoughts with us and other readers using the comments below.

© SEOptimise - Download our free business guide to blogging whitepaper and sign-up for the SEOptimise monthly newsletter. How to approach journalists for your SEO

Related posts:

  1. What SEOs can learn from online journalists
  2. How to Write the Perfect Headline
  3. Why every department needs to care about SEO

Seth's Blog : First, do no harm--three rules for public interfaces

 

First, do no harm--three rules for public interfaces

When we think of design, we usually imagine things that are chosen because they are designed. Vases or comic books or architecture...

It turns out, though, that most of what we make or design is actually aimed at a public that is there for something else. The design is important, but the design is not the point. Call it "public design"...

Public design is for individuals who have to fill out our tax form, interact with our website or check into our hotel room despite the way it's designed, not because of it.

In the quest to make it work better, look better or become more powerful, sometimes we do precisely the wrong thing, because we forget about the 'public' part of public design. If the user isn't focused or interested in the innovation of our design, we have an obligation to get out of the way.

Rule 1: The more often a device is used by first-time users, the more standardized the interface should be. 

For example, the shower in a hotel. Some of the most elegant, clever design ever created by man exists in the dials and wheels in the hotel shower. All of it is worse than a waste--it's dangerous and time-consuming. Guests don't want to learn a new way to turn on the shower, they don't want to burn themselves, they just want the water to come out, at the right temperature, in the right direction, with the right quantity. The first time.

Rule 2: Who gets left out is the most important question.

Small ramps are better than a few stairs, given the choice. The more of the public we include, by definition, the better the choice.

Everyone takes a shower without their glasses, and yet the little, indistinguishable bottles in the shower often have 12 point type describing what's inside. No, I'm not going to wear reading glasses in the shower. Shampoo maybe

If the disabled, the elderly, or those without the latest browser can't use what you've created, it doesn't deserve to be in public.

Rule 3: The best interface is no interface.

Great design tells a story. It moves a product from one category to another, increases yield, creates efficiencies and most of all, adds beauty to the interaction.

But it doesn't have to shout. Or confuse. The pro user, the individual who chose your design because it is something she wants to use every day--this user appreciates the power and the beauty you've created. But in public, for the infrequent passerby, do not call attention to what you've built. We have other things to do. The best designer understands what's important.

Don't abdicate the responsibility for great public design. Do not settle for inefficient, banal or ugly. But at the same time, respect the rules. Anyone can grandstand, but it takes real skill to do great public design that works. We're not looking for design we notice... no, it's design that improves the experience for the public that is the best public design.


More Recent Articles

[You're getting this note because you subscribed to Seth Godin's blog.]

Don't want to get this email anymore? Click the link below to unsubscribe.




Your requested content delivery powered by FeedBlitz, LLC, 9 Thoreau Way, Sudbury, MA 01776, USA. +1.978.776.9498