Spam Score: Moz's New Metric to Measure Penalization Risk Posted on: Monday 30 March 2015 — 13:00 Posted by randfish Today, I'm very excited to announce that Moz's Spam Score, an R&D project we've worked on for nearly a year, is finally going live. In this post, you can learn more about how we're calculating spam score, what it means, and how you can potentially use it in your SEO work.
How does Spam Score work?Over the last year, our data science team, led by Dr. Matt Peters, examined a great number of potential factors that predicted that a site might be penalized or banned by Google. We found strong correlations with 17 unique factors we call "spam flags," and turned them into a score. Almost every subdomain in Mozscape (our web index) now has a Spam Score attached to it, and this score is viewable inside Open Site Explorer (and soon, the MozBar and other tools). The score is simple; it just records the quantity of spam flags the subdomain triggers. Our correlations showed that no particular flag was more likely than others to mean a domain was penalized/banned in Google, but firing many flags had a very strong correlation (you can see the math below). Spam Score currently operates only on the subdomain level—we don't have it for pages or root domains. It's been my experience and the experience of many other SEOs in the field that a great deal of link spam is tied to the subdomain-level. There are plenty of exceptions—manipulative links can and do live on plenty of high-quality sites—but as we've tested, we found that subdomain-level Spam Score was the best solution we could create at web scale. It does a solid job with the most obvious, nastiest spam, and a decent job highlighting risk in other areas, too. How to access Spam ScoreRight now, you can find Spam Score inside Open Site Explorer, both in the top metrics (just below domain/page authority) and in its own tab labeled "Spam Analysis." Spam Score is only available for Pro subscribers right now, though in the future, we may make the score in the metrics section available to everyone (if you're not a subscriber, you can check it out with a free trial).
The current Spam Analysis page includes a list of subdomains or pages linking to your site. You can toggle the target to look at all links to a given subdomain on your site, given pages, or the entire root domain. You can further toggle source tier to look at the Spam Score for incoming linking pages or subdomains (but in the case of pages, we're still showing the Spam Score for the subdomain on which that page is hosted). You can click on any Spam Score row and see the details about which flags were triggered. We'll bring you to a page like this:
Back on the original Spam Analysis page, at the very bottom of the rows, you'll find an option to export a disavow file, which is compatible with Google Webmaster Tools. You can choose to filter the file to contain only those sites with a given spam flag count or higher:
Disavow exports usually take less than 3 hours to finish. We can send you an email when it's ready, too. WARNING: Please do not export this file and simply upload it to Google! You can really, really hurt your site's ranking and there may be no way to recover. Instead, carefully sort through the links therein and make sure you really do want to disavow what's in there. You can easily remove/edit the file to take out links you feel are not spam. When Moz's Cyrus Shepard disavowed every link to his own site, it took more than a year for his rankings to return! We've actually made the file not-wholly-ready for upload to Google in order to be sure folks aren't too cavalier with this particular step. You'll need to open it up and make some edits (specifically to lines at the top of the file) in order to ready it for Webmaster Tools In the near future, we hope to have Spam Score in the Mozbar as well, which might look like this:
Sweet, right? :-) Potential use cases for Spam AnalysisThis list probably isn't exhaustive, but these are a few of the ways we've been playing around with the data:
Over time, we're also excited about using Spam Score to help improve the PA and DA calculations (it's not currently in there), as well as adding it to other tools and data sources. We'd love your feedback and insight about where you'd most want to see Spam Score get involved. Details about Spam Score's calculationThis section comes courtesy of Moz's head of data science, Dr. Matt Peters, who created the metric and deserves (at least in my humble opinion) a big round of applause. - Rand Definition of "spam"Before diving into the details of the individual spam flags and their calculation, it's important to first describe our data gathering process and "spam" definition. For our purposes, we followed Google's definition of spam and gathered labels for a large number of sites as follows.
We performed the most recent data collection in November 2014 (after the Penguin 3.0 update) for about 500,000 subdomains. Relationship between number of flags and spamThe overall Spam Score is currently an aggregate of 17 different "flags." You can think of each flag a potential "warning sign" that signals that a site may be spammy. The overall likelihood of spam increases as a site accumulates more and more flags, so that the total number of flags is a strong predictor of spam. Accordingly, the flags are designed to be used together—no single flag, or even a few flags, is cause for concern (and indeed most sites will trigger at least a few flags). The following table shows the relationship between the number of flags and percent of sites with those flags that we found Google had penalized or banned:
ABOVE: The overall probability of spam vs. the number of spam flags. Data collected in Nov. 2014 for approximately 500K subdomains. The table also highlights the three overall danger levels: low/green (< 10%) moderate/yellow (10-50%) and high/red (>50%) The overall spam percent averaged across a large number of sites increases in lock step with the number of flags; however there are outliers in every category. For example, there are a small number of sites with very few flags that are tagged as spam by Google and conversely a small number of sites with many flags that are not spam. Spam flag detailsThe individual spam flags capture a wide range of spam signals link profiles, anchor text, on page signals and properties of the domain name. At a high level the process to determine the spam flags for each subdomain is:
Since the spam flags are incorporated into in the Mozscape index, fresh data is released with each new index. Right now, we crawl and process the spam flags for each subdomains every two - three months although this may change in the future. Link flagsThe following table lists the link and anchor text related flags with the the odds ratio for each flag. For each flag, we can compute two percents: the percent of sites with that flag that are penalized by Google and the percent of sites with that flag that were not penalized. The odds ratio is the ratio of these percents and gives the increase in likelihood that a site is spam if it has the flag. For example, the first row says that a site with this flag is 12.4 times more likely to be spam than one without the flag.
ABOVE: Description and odds ratio of link and anchor text related spam flags. In addition to a description, it lists the odds ratio for each flag which gives the overall increase in spam likelihood if the flag is present). Working down the table, the flags are:
On-page flagsSimilar to the link flags, the following table lists the on page and domain name related flags:
ABOVE: Description and odds ratio of on page and domain name related spam flags. In addition to a description, it lists the odds ratio for each flag which gives the overall increase in spam likelihood if the flag is present).
If you'd like some more details on the technical aspects of the spam score, check out the video of Matt's 2012 MozCon talk about Algorithmic Spam Detection or the slides (many of the details have evolved, but the overall ideas are the same): We'd love your feedbackAs with all metrics, Spam Score won't be perfect. We'd love to hear your feedback and ideas for improving the score as well as what you'd like to see from it's in-product application in the future. Feel free to leave comments on this post, or to email Matt (matt at moz dot com) and me (rand at moz dot com) privately with any suggestions. Good luck cleaning up and preventing link spam! Not a Pro Subscriber? No problem! Sign up for The Moz Top 10, a semimonthly mailer updating you on the top ten hottest pieces of SEO news, tips, and rad links uncovered by the Moz team. Think of it as your exclusive digest of stuff you don't have time to hunt down but want to read! |
You are subscribed to the newsletter of Moz Blog sent from 1100 Second Avenue, Seattle, WA 98101 United States To stop receiving those e-mails, you can unsubscribe now. | Newsletter powered by FeedPress |
FeedPress is a service edited by Beta&Cie, www.betacie.com |
Niciun comentariu:
Trimiteți un comentariu