miercuri, 12 iunie 2013

Determining Relevance: How Similarity Is Scored

Determining Relevance: How Similarity Is Scored


Determining Relevance: How Similarity Is Scored

Posted: 11 Jun 2013 07:48 PM PDT

Posted by Matt Peters

Today's web search engines have sophisticated ways of measuring whether a web page is related to a given query, based on decades of research in Information Retrieval. Come join me as I explore the inner workings of a search engine's relevance engine and explain what it means for SEOs.

Determining Relevance

When a user submits a query to a search engine, the first thing it must do is determine which pages in the index are related to the query and which are not. Throughout this post, I will refer this as the "relevance" problem. More formally, we can state it as follows:

Given a search query and a document, compute a relevance score that measures the similarity between the query and document.

The "document" in this context can also refer to things like the title tag, the meta description, incoming anchor text, or anything else that we think might help determine whether the query is related to the page. Practically, a search engine computes a number of relevance scores using different page elements and weights them all to arrive at one final score.

The relevance problem has been extremely well studied in the research community. The first papers go back several decades, and it is still an active area of research. In this post, I focus on the most influential approaches that have stood the test of time.

Relevance vs Ranking

Conceptually, we can separate relevance determination from ranking the relevant documents, even if they are implemented as a single step inside a search engine. In this mental framework, the relevance step first makes a binary (True/False) decision for each page, then the ranking step orders the documents to return to the user.

I'll present some data later in this post that vividly illustrates this split and how it relates to different ranking signals.

Query and Document Models

Translating the query and document from raw strings into something we can do computation with is the first hurdle in computing a similarity score. To do so, we make use of "query models" and "document models." The "models" here are just a fancy way of saying that the strings are represented in some other way that makes computation possible.

The above image illustrates this process for the query "philadelphia phillies" and the Wikipedia page about the Phillies. The final step in computing the similarity score runs the query and document representations through a scoring function.

Query Models

The following image illustrates some different types of query models:

The building blocks at the bottom include things like tokenization (splitting the string into words), word normalization (such as stemming where common word endings are removed), and spelling correction (if a query contains a misspelled word, the search engine corrects it and returns results for the corrected word).

Built on top of these building blocks are things like query classification and intent. If the search engine determines that a particular query is time sensitive it will return news results, or if it thinks the query intent is transactional it will display shopping results.

Finally, at the top of the pyramid are more abstract representations of the query such as entity extraction or latent topic representations (LDA). Indeed, Google knows that the "philadelphia phillies" are a major league baseball team and since it is baseball season returns last night's score at the top of the search results (in addition to the knowledge graph on the right).

Document Models

Like query models, there are several different types of document models commonly used in search.

TF-IDF is one of the oldest and most well known approaches that represents each query and document as a vector and uses some variant of the cosine similarity as the scoring function. A language model encodes some information about the statistics of a language and includes knowledge such as the phrase "search engine optimization" is much more common then "search engine walking." Language models are used heavily in machine translation and speech recognition, among other applications. They are also extremely useful in information retrieval. Yet another class of models uses the probability ranking principle, which directly models the probability of relevance given the query and document. Of these, Okapi BM25 has been shown to be particularly effective.

Correlation study

By now, you are probably wondering if search engines actually use any of these things, and if so, which ones are the most important. To explore this, we designed a correlation study similar to ones we have run in the past (see this for some background on the general approach). In this case, we collected the top 50 results from Google-US for about 14,000 keywords. This resulted in about 600,000 pages that we then crawled and used to compute a number of different similarity scores.

As you can see, the language model approach performed the best with a mean Spearman correlation of 0.10, consistent with results published in the research literature.

If we do some stemming of both the query and document first and recompute, the correlations increase slightly across the board:

This suggests that Google is indeed doing some type of word normalization or stemming in their relevance calculation.

Relevance vs Ranking revisited

Comparing these correlations vs Page Authority (an aggregate in-link metric in our Mozscape index) on the same data set, we see a substantial difference:

This begs the question: if these sophisticated similarity scores are so useful, why aren't the correlations higher? The answer lies in the conceptual relevance vs ranking split I discussed earlier.

To convince myself, I constructed an experiment as illustrated below:

To run the experiment, I first took 450 random pages from our dataset stratified across the top 50 results (so that they include nine #1 ranked pages, nine #2 ranked pages, etc.). Then I added the 450 random pages to the top 50 pages in each search result to make one group of 500 pages for each keyword. Since 50 of these pages are in the search result, and 450 are not, 10% of them are relevant to the keyword and 90% are not (the assumption here is that if the page appears in a Google search then it is relevant). Then for each keyword, I collected the Page Authority and Language Model similarity score and sorted by each (the tables in the middle).

Finally, I computed the Precision at 50, which is the percentage of the top 50 results sorted by PA/Language Model score that are actually in the search result. This directly measures the extent to which PA or the Language Model can separate relevant from irrelevant pages. Since 10% of the 500 documents are in the search result, we can achieve a 10% precision by randomly sorting them. This 10% precision is our baseline (bottom gray bars in the image).

The results are striking. The PA precision is very close to the baseline, which says that is does no better then a random number at determining relevance even though it does do a good job at ranking the top 50 once they are known to be relevant. On the other hand, the Language Model precision is close to 100%. Put another way, the Language Model is nearly perfect in determining which of the 500 pages are in the search result, but does a poor job at actually ranking those relevant documents.

Takeaways

This type of query-document similarity scoring is well established in the research literature and underlies every modern information retrieval system. As such, it is fundamental to search and is immune to algorithm change.

Since search engines use sophisticated query and document models, there is no need to optimize separately for similar keywords. For example, any page targeting "movie reviews" will also target "movie review."

Finally, you can use the conceptual split between relevance and ranking in your workflow. When creating or modifying existing content, first concentrate on making the page relevant to a broad set related keywords. Then concentrate on increasing the search position.

More Ranking Factors results coming soon

These are the first results we've released from the 2013 Ranking Factors project. As in years past, the project includes both an industry survey and large correlation study. I'll be presenting the results at MozCon this year (so get your tickets if you haven't already!), and we'll be following it up with a full report sometime later this summer.

To dig deeper

Here are all the slides from my SMX Advanced talk:

I highly recommend the book Introduction to Information Retrieval by Manning et al. It is available for free online reading from their site and provides a comprehensive description of everything discussed in this post (and much, much more). In particular, see Chapters 2, 6, 11 and 12.

Thanks for reading. I look forward to continuing the discussion in the comments below!


Sign up for The Moz Top 10, a semimonthly mailer updating you on the top ten hottest pieces of SEO news, tips, and rad links uncovered by the Moz team. Think of it as your exclusive digest of stuff you don't have time to hunt down but want to read!

The Positive ROI of Conferences: A Deep Look at #MozCon

Posted: 11 Jun 2013 03:03 AM PDT

Posted by Erica McGillivray

It's conference season! Our inbound marketing conference, MozCon, July 8th-10th in Seattle, is just around the corner, and we often get asked by your our community how to approach your boss, CMO, CEO, etc., about coming to MozCon. You want to know more about the value for you and your company or clients, about how we spend those MozCon dollars, and what you can expect once you're here. And furthermore, some of you might be considering coming on your own dime, especially if you're a freelancer, student, or owner of a small business.

Conferences can be spendy when you add up ticket costs, travel, hotel, meals, and more. It's important that you can justify a positive ROI when it comes to your budget. At Moz, we're big believers in what you can learn at conferences, whether in sessions or through networking, (clear ROI) and in the power of serendipity (which can have a less concrete ROI).

Aleyda on stage!

Let's take a deep-dive into what MozCon looks like both from a value and a cost standpoint. MozCon's truly an amazing three-day conference where you'll take away a ton of actionable tips to implement on your site(s) and make new friends, whether the fellow community member sitting next to you, a Mozzer, or one of our industry leaders who are speaking.

And for those of you ready to take the MozCon plunge:

Buy Your Ticket Today!

What's the ROI of My Ticket

MozCon ROI

Actionable Tactics

This year, MozCon has an astounding 35 speakers! They'll be talking about everything from linking building and international SEO to analytics, conversion rate optimization, and email marketing. We have an incredibly strong mix of topics with something for everyone. Our goal is really for you to bring something back with you from every session, which is why every single speaker has a keynote-style session to deliver this information. It's a bit like the best of 35 college courses distilled down to the heart of the subject.

With the exception of our community speakers, who are selected from your pitches, all our speakers are curated from our MozCon selection committee. After speakers have accepted for MozCon, we work with them to ensure that they're going to bring their very best, unique content to MozCon. Topics are chosen both by what said speaker's an expert on, but also what they're currently excited about. 

This year, every speaker had a kick-off call to establish their topic and set up expectations. Even many seasoned speakers can be intimidated by the MozCon stage, and one of my jobs is to make sure that they are ready and confident about their talk. Speakers are also required to send in a draft or outline of their presentation so we can make sure they're on track. Every year, our post-MozCon survey shows that MozCon goers have extremely high expectations. By seeing a draft, we can offer advice. A lot of which is based on what you, the audience, expects from speakers. We make a lot of suggestions about actionable tactics, setting up the audience with what Nancy Duarte calls "the new bliss" to conclude their talks, and pushing content to the next level.

Speakers can send in as many drafts as they'd like for us to review, and final drafts are due about a week before MozCon. Which means I hope speakers are relaxing and practicing their talk, instead of hustling to put last minute slides together. For Mozzers, we've put together several practice sessions (first one was Friday!) for us internally to run through MozCon presentations.

Every single speaker is incredibly excited to be up on that stage and giving you their best. In fact, last year, Paddy Moogan really showed this spirit when he offered, for anyone who didn't learn something from his talk, that he'd buy them a beer and talk with them specifics about their website. Talk about TAGFEE! I don't doubt there will be some similar offers this year.

Inspiration

After actionable tactics, you're sure to come back inspired by MozCon. I know the best conferences I've come back from were the ones that I couldn't wait to get back to work or dive more into learning. Not to mention, the videos are included in the ticket costs, which means you can share the MozCon love with your coworkers and rewatch them yourself when you need a recharge in-between MozCons.

While we certainly stress actionable tactics with our speakers, inspiration comes through with every talk. The tactics may help you win, but the inspiration will fuel the fire. And who doesn't benefit by your productivity being up? You may find yourself excited about a topic you've delved into or seen yourself doing. You may understand what a coworker does a little better. You may have a deeper understanding of something you're already very much an expert in. It says a lot that even MozCon speakers hang out for the other talks to learn too!

A lot of us work around people who doesn't quite "understand" what is we do. Being in a room full of other marketers will keep you on your toes and make you so excited. Who doesn't want to nerd out about OG tags and that link you got in Forbes.

Making Friends

Other people might call this "networking," but at Moz, we're a little more about making friends, who happen to be professional contacts. The MozCon audience is an incredible community. I've never met a group of people who were sharper, more giving of their knowledge and time, and, of course, TAGFEE. 

Whether you're adding industry folks on Twitter or finding a local group to hang out post-MozCon, you'll probably find that connection at MozCon. I know some employers worry about "networking" at conferences and that their employees might come home with connections for new jobs. But more what I see is excited people, who've found connections who often end up solving those "omg, I'm trying to do this and it is not working" and then a community member steps in to share knowledge. This sharing of knowledge doesn't stop when attendees have returned to their respective homes.

Make new friends

1:1 with Mozzers and Speakers

We highly encourage all speakers and all Mozzers to mix and mingle with attendees. This year, we'll all be eating in the same room. (Yay for the new venue!) And not to mention, we'll all be in the same big room as speakers are on stage. In the past, we've always had an overflow room for people interested in getting some work done or stepping aside to chat. But this year, there's going to be a larger space with comfortable furniture -- and don't worry, a screen to watch to the presentations -- so you chat and meet-and-greet between sessions or take a brain breather from all the fun.

Most of our speakers are highly approachable to ask them follow up questions after their talks or just in general get to meet them. I mean, who doesn't want to get their photo taken with Rand? ;) 

This year, all Mozzers will be wearing blue t-shirts labeled with "staff" so you won't miss us. (Don't worry, we have three identical ones, so we'll be fresh smelling during MozCon.) We're here not only to point out where the coat rack is, but also just hang out and give you insights into what it's like to work at Moz. Everyone from our engineers and finance team to marketing and help will be attending MozCon for our own learning experience and to meet each and every one of you. We seriously love to talk all things Moz. And who knows, you might get some extra insights into the future of what we're cooking.

Tuesday Night Party

No one throws a party like that robot Roger. Okay, we can't always bring Roger with us -- those robot repair bills are astronomical! -- but we do know how to throw a great party. Okay, this might not be something to write home to the boss about, unless you do solve that work problem that night, but it is a place to make more friends and also relax after all that learning. We provide noms and drinks, not to mention plenty of karaoke. 

This year's party takes place the EMP Museum. Where you'll not only be able to sing your heart out on stage, but you'll also be able to find a quiet place to chat with someone or tour the EMP Museum. You know, they have Daleks in the basement, David Bowie's infamous Labyrinth gear, and a whole amazing tribute to Seattle's favorite hometown band, Nirvana. Seriously, for those of you just flying in and out for MozCon, you'll have a chance to take a tour of one of Seattle's most unique and fun museums. I think it's pretty rad.

Roger Hugs

Every year that loveable robot of ours, Roger Mozbot, makes his way out from crunching your data to the breaks during MozCon. He gets his own photo booth, and you can get all the hugs from him. Plan on bringing some props and lots of love. Because this fellow can't get enough hugs from you.

Roger and Phil are BFF

Fun

See EVERYTHING. If you don't find some fun at MozCon, I will personally buy you a cupcake. (Cupcakes are the international sign of fun, right?)

Yummy Food

For those of you following us on social media, you may have noticed a theme: we love good food. I can't think of a Mozzer who doesn't fancy themselves something of a foodie. We can seriously give Anthony Bourdain and Guy Fieri a run for their money as our staff includes a former chef, a former bartender, and someone we're sure has sampled every dessert from Seattle to South Africa. Whether you're looking for a great steak, an amazing mixed drink, or some blasted broccoli, some Mozzer will be able to point the way. (Seriously, stay tuned because my fellow Mozzers are crowdsourcing a list of the most delicious places in Seattle to eat at and more.) We bring the same enthusiasm to our menus at MozCon. But more on that soon.

Okay, that's the incredible value you can get from coming to MozCon. But what about the actual price? Why does a PRO member ticket cost $999? What do we actually do with that money?

What's the Breakdown of the Cost of My Ticket?

Every bit of money made for MozCon goes directly back into MozCon. Moz has actually never turned a profit on MozCon (or covered its costs) from MozCon ticket sales. And that's okay, because we don't have to. Other conferences have to get sponsors and have exhibitor halls to make extra cash because they need it to cover conference costs. We're pretty privileged that we don't have to. Don't get me wrong, it's our goal every year to cover costs; but we'd rather you have a world-class experience you won't ever forget than say not pay for international travel for some speakers or skimp on a/v.

Let's get into specific costs. Transparency, ftw. I've broken down the costs from a $999 and how much goes to what. (Now, I realize that not everyone bought a $999 ticket; some people aren't PRO members, some people got early bird deals, etc. But the $999 is our standard ticket, and varying ticket costs cover for those other tickets.)

Food and Beverage - $365

The Cost of MozConYep, food and beverage makes up the biggest costs to us. Your ticket includes breakfast and lunch each day (six meals!), two snacks (mid-morning and mid-afternoon), and one Tuesday evening party. As I mentioned above, Mozzers are foodies, and we don't cut corners when it comes to your meals during MozCon. We do this for a few reasons: it makes your experience more awesome and you're more likely to stick around during mealtimes, which means hanging out with Mozzers and Speakers.

Let's face it, no one likes it when you're handed a cardboard box with a turkey sandwich and a smashed cookie. Or in this vegetarian's case, some wilted lettuce and a soggy apple. (If I've learned one thing from conference and airline catering, it's that no one thinks vegetarians like cookies!) Not to mention, usually you see the Speakers and others sneaking out when they look at those cardboard boxes.

If we didn't have meals, it's true, you might be able to save your employer some monies by eating at Subway every day. (Subway affectionato and Mozzer Andrew Dumont probably has coupons he'd let you have.) But you're going to have to find where you want to eat, maybe take some friends, leave the conference, find the place, order, put the recipe in that very special place you won't forget it, eat, and then find your way back. Sure, Seattle has tons of delicious options, but I recommend coming in the weekend before or heading out Monday and Wednesday nights for that sort of exploration.

This cost also covers the catering staff, who besides cooking the food, will be making sure everything goes smoothly with serving and stays neat and tidy. They also assist in special meals for those of you who are vegan, gluten-free, dairy-free, kosher, halal, or have other allergens. (Don't worry, fellow vegetarians, there's plenty of great noms for us in the main buffet.) Remember, these catering folks are the ones refilling the coffee, so we love them. 

Speakers - $158

MozCon truly brings in top-notch industry speakers who are experts in their fields and great presenters. We cover these speakers travel costs and hotels, and we believe that it's worth every penny. MozCon speakers are the heart-and-soul of MozCon, along with Roger hugs, so we want all our Speakers to be wrapped in that great Seattle hug.

A/V and Video - $157 

Okay, this is probably another bucket were you're like "What, Moz, A/V is how much of my ticket cost? Almost as much as Speakers?" Last year, the MozCon crew decided that we really needed to make the next step into making MozCon truly world-class. Many Speakers from 2012 said that they felt like rock stars on our stage. A/V sends all the signals from when to clap for the next speakers to when to quite down after a break. Not to mention, we've, by popular demand, baked the price of MozCon Videos into the ticket costs.

Our 13-person a/v crew ensures our speakers' presentations look sharp and do all the exciting things they're supposed to. No matter if they're playing video or rapping Mad Men-style like Mike King did last year, we want to be able to support it. Plus, an impeccable stage means all eyes are always where they're supposed to be. Our a/v crew does more than just the stage. They also do the lighting -- just say no to fluorescents you can't dim or control --, play any music, make sure we have video in the lounge area, and generally make MozCon feel like one heck of an amazing show. 

A/V also assists with getting us the MozCon Videos all pretty and ready for you. We truly couldn't put on such an amazing show and deliver such awesome videos post-show. How else are you going to catch all those tips that you missed writing down because they were flying off the stage so quickly? Or share with your coworker, who's planning on going next year, what happened.

Interior Design and Signage - $75

The Washington State Convention Center is basically a big room with four walls, concrete floors, and fluorescent bank lights. The good news is, unlike a hotel, we can really make it ours. The bad news is that isn't cheap. Just covering that cement floor with carpet is $30,000. But we wouldn't want to hear people's shoes on the floor over analytic tactics from Avinash Kaushik. We also need to make sure we have tables, chairs, registration booths, and all those others conference basics. At MozCon, we don't make you balance your laptop on your lap with your drink, your phone, and your snack. Instead, we have tables where everyone can put down their laptops, drinks, etc., which leads to far more productivity and less spillage. :) Not to mention my Cliff Bars never fly over seats and hit people in the backs of their heads as I struggle to open the package while holding onto all my stuff. (Sorry, friends at SES NYC!)

Happy MozCon goers

Networking Party - $70 

I've already talked a lot about the Tuesday night party at the EMP Museum. It's going to be pretty awesome. Not only are you getting to see the Museum exhibits (normally $20 per adult), but you're getting food and drink and some amazing extras. Wine, beer, and well drinks are all on us. Anyone who's ever thrown a wedding, anniversary, office, or birthday party with the cost of alcoholic beverages factored in knows that it starts to add up quickly.

Electrical - $40 

The first time I helped run a large event -- GeekGirlCon 2012, approximately 2,000 people over two days -- I was shocked to receive a post-event bill in the thousand plus dollar range for overages on electrical even when I'd put down a deposit for overages. Not even counting what was already included in my contract. Electricity runs everything. We not only have our big stage at MozCon, but we also just have to keep the lights on, keep the room temperature optimal, and make sure that you can charge your laptop, tablet, and phone so nothing goes dead during MozCon. MozCon's a little unique in that each table is equipped with electrical plugs so no one ever cries over a dead battery. Or worse, has to switch to live tweeting on a smart phone! ;)

Swag - $35

This year, each MozCon attendee will get a Roger figurine. Yep, I think that's all you need know. :) 

Roger for everyone!

We also will give out some other pretty nifty swag items, including limited edition MozCon t-shirts and a host of other Moz-branded items. Yep, be the first one to get some Moz swag at MozCon.

Credit Card Processing Fees - $33

Pretty boring. But have you ever been annoyed when purchasing tickets, say on TicketMaster, at the additional "processing fees"? Unlike other events, who make the price go up in your shopping cart, we adjust for them and pay EventBrite monthly.

WiFi - $28

Yes, yes, we know. WiFi hasn't been one of our shining moments at past MozCons. However, with our move to a new venue, we are much more confident in the wifi situation for MozCon. Ideally, each and every one of you will be able to log into the MozCon wifi and tweet (#MozCon), email with coworkers (only pictures of you hugging Roger), and Facebook (with grandma, of course) whenever you need to.

Venue - $23

Besides this being a space cost, the venue costs also include convention center staff, aka the green coats, who assist in all things badge-checking, directional, and more. They work about every event at the convention center and know the place inside and out. Just don't forget your badge in your hotel room!

Misc Labor - $15

While most of our labor costs are tied up either in a/v, catering, or venue costs and Mozzers' salaries, we do have to bring in a few outside this sphere to help out. You'll see our photographer, Rudy Lopez, taking all the photos. And there will be some behind-the-scenes magic that happens before and after MozCon like riggers putting up and taking down signs. 

Erica and the MozCon speakers

I hope this transparency about values and hard costs of MozCon give you a better insight both into how MozCon operates and what to consider when talking to the person who's signing off your MozCon ticket and travel. Or heck, maybe helping you make that decision as a freelancer, student, or otherwise self-employed person to send yourself or as a boss, to send your employees. I also hope this might inspire other conference runners to share a little bit about the value and costs of their conferences.

MozCon is truly a celebration of the inbound marketing community. Around the MozPlex, we like to refer to it as a hug from us to our community. My dream is that each and every one of you has the opportunity to join us for MozCon. I can't wait to meet you and to see you inspired and ready for the next step in your career and your journey as a marketer. Conferences can really be a great stepping stone and have a huge positive ROI for you and your company.

Still in the undecided camp? In the words of LeVar Burton, "but you don't have to take my word for it":

"MozCon is like Disneyland for SEO’s, jampacked with super-geeky SEO Magic Tricks and great chances to meet and say hello to others in the search industry." - Pete Campbell

Why MozCon was the Best Investment I Made in 2011 by Mike King

Plus, if you're interested in that $999 PRO price, sign up for your 30 day free trial and get that MozCon discount. :)

See you there!

Buy Your Ticket Today!


Sign up for The Moz Top 10, a semimonthly mailer updating you on the top ten hottest pieces of SEO news, tips, and rad links uncovered by the Moz team. Think of it as your exclusive digest of stuff you don't have time to hunt down but want to read!

Niciun comentariu:

Trimiteți un comentariu