Posted by Tom Anthony
Hi folks. A quick post today to introduce a new tool which may be of interest to those of you working on any international sites or sites that cater to more than one country/demographic. It's a tool that allows you to automatically download a list of backlinks to any URL, broken down not only by TLD/subdomain but also by the language the page is written in.
Earlier this year many of you went to SEOmoz's MozCon conference in Seattle. For those that missed it: one of the presentations there was given by Hannah Smith, one of my Distilled buddies; Hannah spoke about International SEO and how not to suck at it! If you want to read more you can see checkout her slide deck, or read the accompanying blog post she wrote (or even buy the Mozcon videos and watch it!).
As part of her research Hannah and I collaborated on a tool to look at the languages of domains that linked to a given page. What I am presenting today is a spin off of that, which you can use yourselves to investigate which languages are used in pages that link to a given URL. More on the tool below, but first...
Why do I care? Links are yummy in any language!
You are right! Links are yummy in any language! However, it has long been regarded that, especially for non-English sites, links in the same language are an important relevancy indicator (something Hannah dug into in her presentation). Whilst Hannah's research didn't find a strong correlation to collaborate this fact, our instincts do tell us it is important and beyond that: it just makes good sense for brand awareness, referred traffic and future proofing your SEO.
However, there is another use for tools like this one: competitor research. The tool allows you to download and categorise links for not only your own site, but any page you fancy. If you are struggling to get links from a particular language, why not check out your competitor's links and filter for those written in Spanish, French, German or any of a variety of other languages. Now you can target those sources of international links more easily.
How the tool works.
The tool pulls in the top 1000 pages (by PA) linking to the specified URL and allows you to download them. It analyses each domain and identifies the TLD, including most country specific domains, as well as the sub-domain (so you can filter out for links from URLs like de.example.com or fr.examples.com). However, most usefully perhaps, the tool attempts to identify the language of the page.
The language identification works on code taken from Google's Chrome browser (used to drive the "Would you like to translate this page?" feature) and recognises quite reliably several dozen languages. Thanks to Mike McCandless for ripping the code into a separate project, and to Lars Strojny for the PHP bindings. In this case it runs on the title element of a page and so isn't perfect because titles are often short and/or include branding, but I found it to be pretty reliable and the effort of scraping the page certainly isn't worth the effort.
How to use the tool.
All you need is your SEOmoz API details (this is totally free, you just need a free Moz account), and then you can go to the tool:
Once you open the tool you come to a pretty self-explanatory page:
There are only 3 fields you need to fill in, which are your SEOmoz Access ID and Secret Key and the URL of the page you wish to analyse. (Sidenote for those who care: Do not be concerned about entering your SEOmoz Secret Key; it isn't transmitted over the network but is used to generate an authentication hash for the API. If you want the code I made for this, see here).
Once you click the analyse button, if everything went according to plan then you'll receive a download of a CSV file with your results. Open it up in your preferred spreadsheet and take a look. Here is a snippet of the results for Spiegel.de, the German news site:
Here I sorted first my TLD (2nd column) and then by language (4th/5th columns); the 3rd column shows the subdomain. This example shows that even without an indicator from the TLD or subdomain, the languages can be often identified from the title element.
Wrap Up
This tool is quite specialised, and isn't something you are going to need often. However, I do think it is one of those things that is handy to have around for the moments you do want it. Be it as something for a quick check on new clients, as competitor analysis, or just as your regular reporting. In the future I might extend it to do more, so if you have feedback/suggestions I'd love to hear in the comments.
Do you like this post? Yes No