|
|
Investigating Panda & Duplicate Content Issues |
Investigating Panda & Duplicate Content Issues Posted: 02 Aug 2012 05:56 AM PDT During a recent analysis of a website (blog with less than 50k visitors a week), we came across some interesting factors that led to us taking a different approach to investigation. The Problem:
From the above, this doesn't seem like your typical target for Panda, but the dates of the traffic drops were too much of a coincidence.
Panda: A ReminderHigh-quality sites algorithm improvements. [launch codenames "PPtl" and "Stitch", project codename "Panda"] In 2011, we launched the Panda algorithm change, targeted at finding more high-quality sites. We improved how Panda interacts with our indexing and ranking systems, making it more integrated into our pipelines. We also released a minor update to refresh the data for Panda. And one piece of advice for working on a recovery: One other specific piece of guidance we’ve offered is that low-quality content on some parts of a website can impact the whole site's rankings, and thus removing low quality pages, merging or improving the content of individual shallow pages into more useful pages, or moving low quality pages to a different domain could eventually help the rankings of your higher-quality content. (bolded sections highlighted by us) A full list of questions to ask and answer when analysing a Panda hit site: http://googlewebmastercentral.blogspot.co.uk/2011/05/more-guidance-on-building-high-quality.html I previously covered this here: http://www.seoptimise.com/blog/2011/10/seo-tactics-to-tame-the-panda.html and Kevin covered it here http://www.seoptimise.com/blog/2011/05/how-to-survive-a-panda-attack.html The InvestigationOften, QUALITY sites that were affected by Panda lost only a few key pages in rankings across a whole host of content. Keep in mind that Panda looked at a range of signals on content quality. On the sites that we have worked, the two main factors were:
NOTE: A lot of SEOs believe that the Panda Algo is NOT page specific – but experience shows that a few major ranking losses and a site wide dip, due to Panda, are linked to specific pieces of content that causes an activation of the Panda filter. We have found that a cross-section of Panda-hit sites would either take major site-wide hits (thus losing right across the board) or would lose a few key sections, content pieces, AND have smaller losses across the site. "…improve the quality of content or if there is some part of your site that has got especially low-quality content, or stuff that really (is) not all that useful, then it might make sense to not have that content on your site…." http://www.youtube.com/watch?v=gMDx8wFAYYE (bolding ours) Keyword Referral Variation AnalysisTypically we would run a range of different investigations on the site, including a Keyword Referral Variation analysis on post and pre Panda hit periods. This is done for sites that never really did any SEO, and as such didn't keep a record of rankings and ranking fluctuations. It gives a decent snapshot of what the primary drivers were. However, one more issue reared its head – "not provided". In the same Year-on-Year period pre-panda, the site had virtually no "not provided" keywords, but in the post-panda season the value grew to such a large portion that a Year-on-Year keyword analysis would be largely flawed: As you can see from the above sheet, the top 10 key phrases show a significant dip, but the massive growth in "not provided" makes that analysis importent. Over 31K visitors weren't attributed a keyword, so it's difficult to gauge where the primary ranking drops would be. Isolating Page LossesWith "not provided" hiding referrers, it's not possible to isolate specific keyword hits. This means we may have to take another route to isolating areas of loss. An interesting way to do this is to look at pages, previously driving traffic, broken down to identify which pages have lost their referrals from search compared to previous periods. Next we extracted the top 100 entry pages for each period – 2011 and 2012, cross referenced them and highlighted common pages, new pages and cross referenced the traffic. In 2011, the top 100 pages captured 367K visitors. In 2012, this figure was 290K – a difference of 79K. The key was to isolate all the top content pages to understand what their primary losses were. What we did:
The result: This is just a snapshot, but you can see that there were a couple of really good pages in there, and some completely obliterated. The next step would be to isolate all the big drops and look at those pages individually. We like to strip them all out and create a new sheet for clients to look at, using conditional formatting to highlight severe losses and order of priority: Building a Strategy to Recover from PandaIs it possible to recover? Yes. Is it easy? No. The change is algorithm-based, which means tweaking, then waiting, then tweaking a bit more. But, a recovery is possible, if we isolate the issues. Quoting Matt Cutt's Latest video: http://www.youtube.com/watch?feature=player_embedded&v=8IzUuhTyvJk "Remember, Panda is a hundred percent algorithmic. There's nothing manual involved in that. And we haven't made any manual exceptions. And the Panda algorithm, we tend to run it every so often. It's not like it runs every day. It's one of these things where you might run it once a month or something like that. Typically, you're gonna refresh the data for it. And at the same time, you might pull in new signals. And those signals might be able to say, 'Ah, this is a higher-quality site." The steps needed in this case:
Step 1. Isolating the dataIn the analysis of the top 100 pages, the 25 pages that were identified with losses, contributed to a 95K drop in visitors in the time frame analysed! These pages should be a starting point for the site in terms of trying to understand patterns. Step 2. Common patternsInterestingly, poor content and duplicate content can often trigger Panda hits – one of the main targets were scraper sites. One of the first things we tend to do is isolate content from hit pages and "fuzzy match" them to search results. Which is a fancy way of saying, take a piece of content, drop it into the search bar and see what comes up! Random samples of content from a page dropped into Google: The pink sections are matches to the content – i.e. Google bolds them. As you can see, 3 out of the first 4 are exact or close matches. The grey block is the site in question. We took another part of the page, in quotes this time, and dropped it into Google: Insanely there were 42 exact matches – the original site didn't even show up on the first page! As a side note, I checked the date on most of those sites – they are published at least a year AFTER the clients original content. We did this for the top 100 hit pages and came up with: Interestingly, as we move down the scale of pages that were hit, the lower the hits, the lower the duplication! Now don't take that as a correlation, but it is interesting nonetheless. SummaryWhat did we learn? Essentially, although this blog owner has been publishing GOOD content for many, many years, the fact that people have been copying and pasting that content for years only came to light AFTER Panda.
The type of sites we found copying their content or sections of their content: - Scrapers. - Forums (such as money savings expert forum users quoting them!). - Other blogs taking large blocks of text. - Genuine Businesses(!) taking whole sections off to promote their own businesses. - Government sites that were using portions of their text written years prior about certain guidelines. The losses came by a dip in rankings through: - Massively copied pages disappearing from the index. - Content that had portions quoted in forums dropped a few places. - Pages that had content that was copied by high authority forums and Government sites fell out of index or were massively lowered in rankings. - A resultant Panda filter, in our opinion, which lowered the value of the whole site. - Some content was topical in the previous year, which has low search volume now. This is a natural dip in traffic. Other issues contributing to these factors that may play a role: - Most of the losses came from older established blog posts – not been updated in a long time. Was Google using a filter to show newer pages with that content instead? - Some key pages suffered link loss – although this needs further investigation, there may be further room to help that along. - Some of these posts were written way before social shares existed – thus low or no social shares on them for a very long time – nor was there any incentive on the content to do so. - Bounce rates had gone up from 70% average to 85% average – we investigated the content – it is still relevant and correct, but we feel that the date which is prominently displayed on the post may put visitors off, who are bouncing away. Related Google Algorithm factors that may also be to blame: - Caffeine (close correlation with content outranking). - Bounce backs (increase in bounce rate)? (not proven). - Lack of Social Signals (not proven).
Potential Actions and Solutions:Try and get Author Attribution – since this is a blog, which has been running for a long time, and has established followers, they should get their attribution in place – this helps make the site a lot more "legitimate". More reading here: - http://www.seomoz.org/blog/authorship-google-plus-link-building - http://yoast.com/wordpress-rel-author-rel-me/ Legal action and a DMCA request – where necessary and appropriate, get content disavowed in Google. Further resources here: - http://blog.kissmetrics.com/content-scrapers/ https://docs.google.com/spreadsheet/viewform?formkey=dGM4TXhIOFd3c1hZR2NHUDN1NmllU0E6MQ - https://www.google.com/webmasters/tools/dmca-notice?pli=1& Refresh and rewrite badly hit pages. A take down request etc. takes time – it is often easier to rewrite content, at the same time making it "fresh" in the algorithm's view, and so a double benefit. Adding more content to "light pages" and adding resources etc. to heavy pages also seems to help make a piece of content authoritative. Social shares and fresh links. The content is actually of a decent quality, but the lack of focus for social sharing means they are losing out on valuable traffic, at the same time not sending social signals that legitimises the content. Generating a decent number of +1's, tweets, etc. can help get the content re-crawled more often as well. At the same time, to speed up the re-ranking process, we suggest getting some fresh links into content that has had the most link loss.
© SEOptimise - Download our free business guide to blogging whitepaper and sign-up for the SEOptimise monthly newsletter. Investigating Panda & Duplicate Content Issues Related posts: |
You are subscribed to email updates from SEOptimise » blog To stop receiving these emails, you may unsubscribe now. | Email delivery powered by Google |
Google Inc., 20 West Kinzie, Chicago IL USA 60610 |
Don't follow, lead.
Don't copy, create.
Don't start, finish.
or even,
Don't sit still, move.
Don't fit in, stand out.
Don't sit quietly, speak up.
Not all the time, sure, but more often.
[You're getting this note because you subscribed to Seth Godin's blog.]
Don't want to get this email anymore? Click the link below to unsubscribe.
Your requested content delivery powered by FeedBlitz, LLC, 9 Thoreau Way, Sudbury, MA 01776, USA. +1.978.776.9498 |
Mish's Global Economic Trend Analysis |
China Buys U.S. Businesses at Record Pace; What are the Implications? Will Alarm Bells Ring? Posted: 02 Aug 2012 10:42 PM PDT CNN Money reports Chinese buying of U.S. business at record pace Chinese direct investment in the United States could hit a record high in 2012, according to a new research report released Wednesday.What are the Implications? China buying US businesses is a necessary part of correcting global imbalances. As a direct function of trade math, China's reserves must eventually return to the US. The only way that will not happen is if the US defaults on foreign-held treasuries. However, don't be deceived by the words "record pace". To put the $8 billion of direct investment in perspective, China has close to $1.75 trillion in US dollar reserves and $3.2 trillion worth of total reserves. Will Alarm Bells Ring? Some might be alarmed by China buying US businesses. Actually this is a good thing, and the faster things speed up, the better off the US and China will both be. Direct investment will provide much-needed jobs in the US and it will alleviate China's dependence on an unsustainable model of fixed investment. Unfortunately, "record pace" is nowhere close enough to matter, but all trends start somewhere. The key point is that mathematically, dollars must return home, and the sooner it happens the better off the global economy will be. Don't expect alarmists in Congress and union sympathizers to see it that way. Mike "Mish" Shedlock http://globaleconomicanalysis.blogspot.com Click Here To Scroll Thru My Recent Post List Mike "Mish" Shedlock is a registered investment advisor representative for SitkaPacific Capital Management. Sitka Pacific is an asset management firm whose goal is strong performance and low volatility, regardless of market direction. Visit http://www.sitkapacific.com/account_management.html to learn more about wealth management and capital preservation strategies of Sitka Pacific. |
US Factory Orders "Unexpectedly" Decline; US Car Sales "Unexpectedly" Decline; Expect the Unexpected Posted: 02 Aug 2012 11:57 AM PDT The words for the day once again are "unexpectedly declined". I have a couple of examples. The New York Times reports U.S. Factory Orders Fall Unexpectedly New orders for factory goods unexpectedly fell in the United States in June, a fresh sign that the slowdown in the country's manufacturing sector will probably stretch into the second half of the year.Car Sales "Somewhat Softer Than Expected" Yesterday, Yahoo!Finance reported U.S. auto sales remain soft in July Major automakers reported U.S. auto sales for July that were somewhat softer than expected as high U.S. unemployment and weak consumer confidence kept would-be buyers on the sidelines.Expect the Unexpected Why economists could not see this coming is a mystery. Manufacturing new orders have collapsed virtually everywhere, including the US. GDP, a lagging indicator, is 1.5% annualized, well below the stall speed of 2%. Based on new orders and anecdotal evidence from the world's largest auto parts manufacturer, I confidently predicted on July 9, Global Collapse In Auto Sales Coming Up. On July 2, I noted US Manufacturing ISM Contracts for First Time in Three Years; New Orders and Prices Plunge; Perfect Miss: 0 of 70 Economists Polled By Bloomberg Expected Contraction Yesterday I noted Dismal Manufacturing Numbers Worldwide; US ISM in Contraction Second Month. Yet economists were surprised by today's "unexpected decline" in US Factory Orders and yesterday's decline in auto sales. The surprise ought to have been that car sales and factory orders held up as well as they did. Growing Evidence of Recession With each economic report, it becomes more clear the US is already in recession, yet economists cannot see that yet either. If the jobs report is miserable tomorrow, and I expect it to be, then expect economists to be surprised by that too. For Friday's job forecast ADP predicts +163,000 jobs but I'll Take the Under (Way Under). The economic consensus for Friday is about +100,000 jobs and I will take the under on that as well. Zero to 50,000 would not surprise me in the least. Mike "Mish" Shedlock http://globaleconomicanalysis.blogspot.com Click Here To Scroll Thru My Recent Post List Mike "Mish" Shedlock is a registered investment advisor representative for SitkaPacific Capital Management. Sitka Pacific is an asset management firm whose goal is strong performance and low volatility, regardless of market direction. Visit http://www.sitkapacific.com/account_management.html to learn more about wealth management and capital preservation strategies of Sitka Pacific. |
Posted: 02 Aug 2012 08:20 AM PDT Today the ECB left interest rates unchanged and hinted at future bond purchases but also warned "Governments must stand ready to activate the EFSF/ESM". The Financial Times has details in Draghi prepares for fresh bond buying Draghi admitted to Bundesbank reservations about bond-buying and made clear that governments would first have to apply to the eurozone's rescue funds – the European Financial Stability Facility and the European Stability Mechanism – and accept "strict and effective conditionality".Text of Draghi's Press Conference I cannot find some of the direct quotes the Financial Times mentions, but the gist of the Financial Times' translation seems accurate. Here are some snips from ECB President Draghi Statement to Press Conference Based on our regular economic and monetary analyses, we decided to keep the key ECB interest rates unchanged, following the decrease of 25 basis points in July. As we said a month ago, inflation should decline further in the course of 2012 and be below 2% again in 2013.Yields Soar Draghi's statements sent the Spain 10-year bond yield soaring back above 7%, currently 7.13, up 40 basis points. Yield on Italy's 10-year government bond is up 30 basis points to 6.23%. Clearly the market was expecting far more after Draghi's statements last week that the ECB would do "whatever it takes". Mike "Mish" Shedlock http://globaleconomicanalysis.blogspot.com Click Here To Scroll Thru My Recent Post List Mike "Mish" Shedlock is a registered investment advisor representative for SitkaPacific Capital Management. Sitka Pacific is an asset management firm whose goal is strong performance and low volatility, regardless of market direction. Visit http://www.sitkapacific.com/account_management.html to learn more about wealth management and capital preservation strategies of Sitka Pacific. |
You are subscribed to email updates from Mish's Global Economic Trend Analysis To stop receiving these emails, you may unsubscribe now. | Email delivery powered by Google |
Google Inc., 20 West Kinzie, Chicago IL USA 60610 |