Investigating Panda & Duplicate Content Issues |
Investigating Panda & Duplicate Content Issues Posted: 02 Aug 2012 05:56 AM PDT During a recent analysis of a website (blog with less than 50k visitors a week), we came across some interesting factors that led to us taking a different approach to investigation. The Problem:
From the above, this doesn't seem like your typical target for Panda, but the dates of the traffic drops were too much of a coincidence.
Panda: A ReminderHigh-quality sites algorithm improvements. [launch codenames "PPtl" and "Stitch", project codename "Panda"] In 2011, we launched the Panda algorithm change, targeted at finding more high-quality sites. We improved how Panda interacts with our indexing and ranking systems, making it more integrated into our pipelines. We also released a minor update to refresh the data for Panda. And one piece of advice for working on a recovery: One other specific piece of guidance we’ve offered is that low-quality content on some parts of a website can impact the whole site's rankings, and thus removing low quality pages, merging or improving the content of individual shallow pages into more useful pages, or moving low quality pages to a different domain could eventually help the rankings of your higher-quality content. (bolded sections highlighted by us) A full list of questions to ask and answer when analysing a Panda hit site: http://googlewebmastercentral.blogspot.co.uk/2011/05/more-guidance-on-building-high-quality.html I previously covered this here: http://www.seoptimise.com/blog/2011/10/seo-tactics-to-tame-the-panda.html and Kevin covered it here http://www.seoptimise.com/blog/2011/05/how-to-survive-a-panda-attack.html The InvestigationOften, QUALITY sites that were affected by Panda lost only a few key pages in rankings across a whole host of content. Keep in mind that Panda looked at a range of signals on content quality. On the sites that we have worked, the two main factors were:
NOTE: A lot of SEOs believe that the Panda Algo is NOT page specific – but experience shows that a few major ranking losses and a site wide dip, due to Panda, are linked to specific pieces of content that causes an activation of the Panda filter. We have found that a cross-section of Panda-hit sites would either take major site-wide hits (thus losing right across the board) or would lose a few key sections, content pieces, AND have smaller losses across the site. "…improve the quality of content or if there is some part of your site that has got especially low-quality content, or stuff that really (is) not all that useful, then it might make sense to not have that content on your site…." http://www.youtube.com/watch?v=gMDx8wFAYYE (bolding ours) Keyword Referral Variation AnalysisTypically we would run a range of different investigations on the site, including a Keyword Referral Variation analysis on post and pre Panda hit periods. This is done for sites that never really did any SEO, and as such didn't keep a record of rankings and ranking fluctuations. It gives a decent snapshot of what the primary drivers were. However, one more issue reared its head – "not provided". In the same Year-on-Year period pre-panda, the site had virtually no "not provided" keywords, but in the post-panda season the value grew to such a large portion that a Year-on-Year keyword analysis would be largely flawed: As you can see from the above sheet, the top 10 key phrases show a significant dip, but the massive growth in "not provided" makes that analysis importent. Over 31K visitors weren't attributed a keyword, so it's difficult to gauge where the primary ranking drops would be. Isolating Page LossesWith "not provided" hiding referrers, it's not possible to isolate specific keyword hits. This means we may have to take another route to isolating areas of loss. An interesting way to do this is to look at pages, previously driving traffic, broken down to identify which pages have lost their referrals from search compared to previous periods. Next we extracted the top 100 entry pages for each period – 2011 and 2012, cross referenced them and highlighted common pages, new pages and cross referenced the traffic. In 2011, the top 100 pages captured 367K visitors. In 2012, this figure was 290K – a difference of 79K. The key was to isolate all the top content pages to understand what their primary losses were. What we did:
The result: This is just a snapshot, but you can see that there were a couple of really good pages in there, and some completely obliterated. The next step would be to isolate all the big drops and look at those pages individually. We like to strip them all out and create a new sheet for clients to look at, using conditional formatting to highlight severe losses and order of priority: Building a Strategy to Recover from PandaIs it possible to recover? Yes. Is it easy? No. The change is algorithm-based, which means tweaking, then waiting, then tweaking a bit more. But, a recovery is possible, if we isolate the issues. Quoting Matt Cutt's Latest video: http://www.youtube.com/watch?feature=player_embedded&v=8IzUuhTyvJk "Remember, Panda is a hundred percent algorithmic. There's nothing manual involved in that. And we haven't made any manual exceptions. And the Panda algorithm, we tend to run it every so often. It's not like it runs every day. It's one of these things where you might run it once a month or something like that. Typically, you're gonna refresh the data for it. And at the same time, you might pull in new signals. And those signals might be able to say, 'Ah, this is a higher-quality site." The steps needed in this case:
Step 1. Isolating the dataIn the analysis of the top 100 pages, the 25 pages that were identified with losses, contributed to a 95K drop in visitors in the time frame analysed! These pages should be a starting point for the site in terms of trying to understand patterns. Step 2. Common patternsInterestingly, poor content and duplicate content can often trigger Panda hits – one of the main targets were scraper sites. One of the first things we tend to do is isolate content from hit pages and "fuzzy match" them to search results. Which is a fancy way of saying, take a piece of content, drop it into the search bar and see what comes up! Random samples of content from a page dropped into Google: The pink sections are matches to the content – i.e. Google bolds them. As you can see, 3 out of the first 4 are exact or close matches. The grey block is the site in question. We took another part of the page, in quotes this time, and dropped it into Google: Insanely there were 42 exact matches – the original site didn't even show up on the first page! As a side note, I checked the date on most of those sites – they are published at least a year AFTER the clients original content. We did this for the top 100 hit pages and came up with: Interestingly, as we move down the scale of pages that were hit, the lower the hits, the lower the duplication! Now don't take that as a correlation, but it is interesting nonetheless. SummaryWhat did we learn? Essentially, although this blog owner has been publishing GOOD content for many, many years, the fact that people have been copying and pasting that content for years only came to light AFTER Panda.
The type of sites we found copying their content or sections of their content: - Scrapers. - Forums (such as money savings expert forum users quoting them!). - Other blogs taking large blocks of text. - Genuine Businesses(!) taking whole sections off to promote their own businesses. - Government sites that were using portions of their text written years prior about certain guidelines. The losses came by a dip in rankings through: - Massively copied pages disappearing from the index. - Content that had portions quoted in forums dropped a few places. - Pages that had content that was copied by high authority forums and Government sites fell out of index or were massively lowered in rankings. - A resultant Panda filter, in our opinion, which lowered the value of the whole site. - Some content was topical in the previous year, which has low search volume now. This is a natural dip in traffic. Other issues contributing to these factors that may play a role: - Most of the losses came from older established blog posts – not been updated in a long time. Was Google using a filter to show newer pages with that content instead? - Some key pages suffered link loss – although this needs further investigation, there may be further room to help that along. - Some of these posts were written way before social shares existed – thus low or no social shares on them for a very long time – nor was there any incentive on the content to do so. - Bounce rates had gone up from 70% average to 85% average – we investigated the content – it is still relevant and correct, but we feel that the date which is prominently displayed on the post may put visitors off, who are bouncing away. Related Google Algorithm factors that may also be to blame: - Caffeine (close correlation with content outranking). - Bounce backs (increase in bounce rate)? (not proven). - Lack of Social Signals (not proven).
Potential Actions and Solutions:Try and get Author Attribution – since this is a blog, which has been running for a long time, and has established followers, they should get their attribution in place – this helps make the site a lot more "legitimate". More reading here: - http://www.seomoz.org/blog/authorship-google-plus-link-building - http://yoast.com/wordpress-rel-author-rel-me/ Legal action and a DMCA request – where necessary and appropriate, get content disavowed in Google. Further resources here: - http://blog.kissmetrics.com/content-scrapers/ https://docs.google.com/spreadsheet/viewform?formkey=dGM4TXhIOFd3c1hZR2NHUDN1NmllU0E6MQ - https://www.google.com/webmasters/tools/dmca-notice?pli=1& Refresh and rewrite badly hit pages. A take down request etc. takes time – it is often easier to rewrite content, at the same time making it "fresh" in the algorithm's view, and so a double benefit. Adding more content to "light pages" and adding resources etc. to heavy pages also seems to help make a piece of content authoritative. Social shares and fresh links. The content is actually of a decent quality, but the lack of focus for social sharing means they are losing out on valuable traffic, at the same time not sending social signals that legitimises the content. Generating a decent number of +1's, tweets, etc. can help get the content re-crawled more often as well. At the same time, to speed up the re-ranking process, we suggest getting some fresh links into content that has had the most link loss.
© SEOptimise - Download our free business guide to blogging whitepaper and sign-up for the SEOptimise monthly newsletter. Investigating Panda & Duplicate Content Issues Related posts: |
You are subscribed to email updates from SEOptimise » blog To stop receiving these emails, you may unsubscribe now. | Email delivery powered by Google |
Google Inc., 20 West Kinzie, Chicago IL USA 60610 |
Niciun comentariu:
Trimiteți un comentariu