Plugging the link leaks, part 1 – reclaim links you are throwing away |
Plugging the link leaks, part 1 – reclaim links you are throwing away Posted: 15 Mar 2013 09:51 AM PDT In London hundreds of SEOs have gathered for LinkLove, and as it is a day of sharing tips on getting more links, we thought we would join in. As the easy self-publishing or submission tactics fall by the wayside, link building has become a far more creative, and time-consuming, process. But at SEOptimise, as well as building links through content, we also regularly boost clients’ link profiles without typing a word. There’s no asking for links, nor risking the wrath of Google's anti-spam team. This is link reclamation – fixing existing links that point to broken or inefficiently redirected pages on your site. As Ian Lurie pointed out in one of his excellent webinars last year, before worrying about various creative methods of generating links, "get the #@@!#@$ easy links" first. And link reclamation is just that – it might take a couple of hours to complete, but can be a boost for any campaign. ToolsWhat you'll need for your link reclamation project:
Finding your broken linksNow, the quick-win version of this process is simply to put together a list of all the broken URLs on your site that have external links pointing to them, ready to put 301 redirects in place. There's nothing wrong with doing this, and it will certainly give you the boost of reclaiming your lost authority, but sometimes we need to know where all the broken links are. This is so we can see which broken links we should redirect, and which we want to attempt to have fixed on the source URL. Plus, as an agency, it can be advantageous to be able to report all the links we have reclaimed. So, let's put together as comprehensive a list of broken links pointing to our target site as possible. Backlink dataOur first port of call is backlink data. Go to the tool of your choice and look up your site. We're using Open Site Explorer in the examples here, but Majestic and Ahrefs both also provide perfect data for this. Within the inbound links tab select links from "only external" pages and either "pages on this sub-domain" or "pages on this root domain", depending on the scope of your project. There's a whole range of metrics we could use to investigate, but to keep things moving, delete all the columns except for URL, anchor text, page authority, domain authority, followable and target URL. Doing this allows us to analyse our broken links by PA or DA, and see which are no-followed, helping us decide which links to 301, and which to reach out to have fixed to the correct URL. If you are not using OSE, then Majestic SEO and Ahrefs have their own importance metrics. Now to find our broken links. Copy the entries in the target URL column, and paste them into a new spreadsheet. Use the remove duplicates feature within the data tab, and save as a .csv or .txt file. Fire up Screaming Frog, and select 'List' from the mode menu. Choose your file of URLs, and start crawling. Once the crawl is complete, select the Response Codes tab and filter to 'Client Error (4XX)'. You now have a complete list of URLs that external sites are linking to which don't exist on your server. No URLs on the list? Congratulations! You have no broken links to fix, and can crack on with working on ways to generate fresh links. If, like most sites we've worked with, you have URLs here, export the list. Finding 302 redirectsStill in screaming Frog, filter to 'Redirection (3xx)', and order the results by the 'Status Code' column. Are there any 302 redirects in there? If so, export this list, open in Excel and make the data a table (ctrl+T is the shortcut). Filter by Status Code to find the 302s, and copy the data. Open your exported list of URLs resulting in 404 errors, and paste your 302 data into the spreadsheet. You now have a complete list of linked-to pages we want to fix. Getting cleverIt's time to prune data again. Delete or hide every column until you are left with just the Address and Status Code columns. Once ready, select all the 404/302 data and copy. Go back to your spreadsheet with OSE data. You need to paste in the two columns, either to the right of the OSE data, or in a new sheet (however you prefer to work). Now for the (relatively) clever bit. Add a column to the right of your OSE data, and call it 'status code', then turn all the OSE data into a table. Now we are going use a VLOOKUP function in the new 'status code' column to have Excel tell us which of our OSE links match the 404 errors we found in Screaming Frog. The code we used is =IFERROR(VLOOKUP(F:F,I:J,2,FALSE),”"), with F:F specifying the Target URL column in the OSE data, and I:J the Address and Status Code columns respectively in the Screaming Frog data. (A big hat-tip to Joe and Tamsin for patiently helping me with Excel formulas!) Alternatively use the Insert Function wizard in the Formulas tab to work through the process, though you will have to add the IFERROR part afterwards. Our 'status code' column should now contain the code from the Screaming Frog data each time one of our external links points to a URL that returns a 404 or 302 code. Simply filter the source code column by 404 and 400 to give you a complete list of broken URLs. You can then reorder this list by PA, DA or by which are followed. You may also wish to add a ‘date fixed’ column, so you can record when the redirect or edit is in place, and the link starts passing its sweet, sweet authority to your target site. You can also filter by 302, and instantly have a list of redirects to be changed to 301s, and all the links that suddenly pass all their potential link authority to show your client or boss. Not bad for a few minutes' work! Two sources are better than oneSo are we done? Not quite; many SEOs work on the premise that using more than one data source is prudent. Once you have done this process, it's very quick to do the same again from an alternative source; in our example I might now use Ahrefs. Once we have all my 404s/302s from Ahrefs in a new tab in our spreadsheet, we can create a third tab to combine with the 404s from OSE, using the remove duplicates tool once again. Of course, the sources cannot share quality metrics – just URL, anchor text and target URL. However, the advantage of using multiple sources to find a greater number of broken links to fix is worthwhile, and we can still filter on individual sheets. To use every available source of external links leading to 404 errors, we need to use Google Webmaster Tools’ ‘Crawl Errors’ report (found under Diagnostics in the menu). Alas, this is where things become a little more frustrating. As no doubt many of you know, it is impossible to cleanly download a list of each 404 URL address and the links pointing to it, despite the information being available on screen. Plus GWT is not always as up-to-date as we would like. So, we have to use a workaround. What you can download from GWT is all the broken URLs Google has found on your site. So, our first step is to download this list as a .csv file by selecting Health, then Crawl Errors in the left-hand navigation. Select the 'Not found' links, and hit the download button. This file can then be imported using Screaming Frog's list mode, and all the reported broken URLs checked. Any URLs that are now returning 200 or 301 status codes should be removed from your list, and marked as 'fixed' within GWT. We now have a smaller and accurate list of the broken URLs on our site. Create a new tab in the spreadsheet with the broken links we found in our backlink tool, and create headings for URL, target URL and status code. Unfortunately, there's now some manual work involved; how much depends on how many 404 errors GWT is reporting.
As you can see, if you have a lot of reported external links, this can quickly become quite a pain. One helpful shortcut I have found is the Link Clump extension for Chrome. This allows you to create keyboard and mouse action shortcuts for opening or copying multiple links. I set one for copying all URLs selected to the clipboard. This makes it relatively quick to grab all the URLs for each reported error and paste them into my spreadsheet. There's plenty of other great extensions/add-ons that can help with this, such as Scraper for Chrome and Multi Links for Firefox. Please suggest any favourites you have in the comments below! After a bit of leg-work, you will now have a list of all the source links, and their target URL. The final stage is to ensure that these external URLs still exist, and still link to our site. Doing this is a two stage process, both using the same VLOOKUP method we used earlier. Copy all the source URLs and paste into a new spreadsheet, then save as a .csv file. Now go back to Screaming Frog and upload the list and crawl all the URLs. Firstly go to Response Codes and filter for any redirects. If you have some you need to export the list. Open this list and copy the redirect destination URLs, then add these to your master list of URLs from GWT. Next use the same VLOOKUP methodology to remove any URLs that result in a 301 or 302 – we don't want them in our external link list as they no longer exist, but do want the redirect targets, in case our links are there! Now go back to Screaming Frog and filter for any client errors (400/404s). If there are any, again export then use the VLOOKUP method to remove them from our list of external links from GWT. The second step is to check they are still linking to you. Copy the edited column of URLs reported by GWT, and save to (yep, yet another) .csv file. Upload in Screaming Frog and go to the Configuration menu and select Custom to add a bespoke filter. Enter your domain, with or without subdomain depending on your project, and set to 'does not contain'. Crawl your URL list, then head to the Custom tab and filter to your bespoke filter. This then shows you all the URLs that no longer point to you. Export, copy into your main spreadsheet and VLOOKUP one last time and delete these links. You'll need to add some form of marker text in a second column so you can see which ones to delete, or use the Status column. Side note: You may wish to keep a record of these to try and get your site back on them if still relevant – it may be they simply removed the link to you because it was a broken page. Being able to write to the site saying, "You used to link to us and we'd love to be featured once again", is a great reason to have to contact these sites. Your final listSo, after a lot of editing, you have a list of broken external links reported by Google Webmaster Tools, plus the page they are linking to. Add these to your master list (the URLs from OSE and Ahrefs), de-duplicate and you have your final list of links to reclaim. Using the individual sheets for each source you can check each link for importance, deciding which ones to try and have corrected, and which you will simply put a 301 redirect in place for. Of course, as we have recently learned, 301 redirects possibly pass all their authority, but many still prefer to have clean links wherever possible (as previous studies have shown some authority is lost). So that's it. It might seem a little complex or time-consuming at first, but the process only takes a couple of hours, or less if Webmaster Tools hasn't reported too many errors. The rewards vary of course, but if you have an older domain, or one that went through a site migration without SEO assistance, there can be many broken links. We've found several hundred links for clients before doing this – worth getting for any site. To make things a little easier (as this is a long post to follow!), we’ve put together a basic version you can access and copy for your own projects. There’s plenty more that can be done of course. Another good use of time is finding the sites that linked to you at one point, but no longer do so, as excellently laid out here by Ethan Lloyd at Seer Interactive, and we’ll be bringing you more as well. Happy reclaiming! © SEOptimise - Download our free business guide to blogging whitepaper and sign-up for the SEOptimise monthly newsletter. Plugging the link leaks, part 1 – reclaim links you are throwing away Related posts: |
You are subscribed to email updates from SEOptimise » blog To stop receiving these emails, you may unsubscribe now. | Email delivery powered by Google |
Google Inc., 20 West Kinzie, Chicago IL USA 60610 |