miercuri, 15 aprilie 2015

Using Term Frequency Analysis to Measure Your Content Quality - Moz Blog


Using Term Frequency Analysis to Measure Your Content Quality

Posted on: Wednesday 15 April 2015 — 02:14

Posted by EricEnge

It's time to look at your content differently—time to start understanding just how good it really is. I am not simply talking about titles, keyword usage, and meta descriptions. I am talking about the entire page experience. In today's post, I am going to introduce the general concept of content quality analysis, why it should matter to you, and how to use term frequency (TF) analysis to gather ideas on how to improve your content.

TF analysis is usually combined with inverse document frequency analysis (collectively TF-IDF analysis). TF-IDF analysis has been a staple concept for information retrieval science for a long time. You can read more about TF-IDF and other search science concepts in Cyrus Shepard's excellent article here.

For purposes of today's post, I am going to show you how you can use TF analysis to get clues as to what Google is valuing in the content of sites that currently outrank you. But first, let's get oriented.

Conceptualizing page quality

Start by asking yourself if your page provides a quality experience to people who visit it. For example, if a search engine sends 100 people to your page, how many of them will be happy? Seventy percent? Thirty percent? Less? What if your competitor's page gets a higher percentage of happy users than yours does? Does that feel like an "uh-oh"?

Let's think about this with a specific example in mind. What if you ran a golf club site, and 100 people come to your page after searching on a phrase like "golf clubs." What are the kinds of things they may be looking for?

Here are some things they might want:

  1. A way to buy golf clubs on your site (you would need to see a shopping cart of some sort).
  2. The ability to select specific brands, perhaps by links to other pages about those brands of golf clubs.
  3. Information on how to pick the club that is best for them.
  4. The ability to select specific types of clubs (drivers, putters, irons, etc.). Again, this may be via links to other pages.
  5. A site search box.
  6. Pricing info.
  7. Info on shipping costs.
  8. Expert analysis comparing different golf club brands.
  9. End user reviews of your company so they can determine if they want to do business with you.
  10. How your return policy works.
  11. How they can file a complaint.
  12. Information about your company. Perhaps an "about us" page.
  13. A link to a privacy policy page.
  14. Whether or not you have been "in the news" recently.
  15. Trust symbols that show that you are a reputable organization.
  16. A way to access pages to buy different products, such as golf balls or tees.
  17. Information about specific golf courses.
  18. Tips on how to improve their golf game.

This is really only a partial list, and the specifics of your site can certainly vary for any number of reasons from what I laid out above. So how do you figure out what it is that people really want? You could pull in data from a number of sources. For example, using data from your site search box can be invaluable. You can do user testing on your site. You can conduct surveys. These are all good sources of data.

You can also look at your analytics data to see what pages get visited the most. Just be careful how you use that data. For example, if most of your traffic is from search, this data will be biased by incoming search traffic, and hence what Google chooses to rank. In addition, you may only have a small percentage of the visitors to your site going to your privacy policy, but chances are good that there are significantly more users than that who notice whether or not you have a privacy policy. Many of these will be satisfied just to see that you have one and won't actually go check it out.

Whatever you do, it's worth using many of these methods to determine what users want from the pages of your site and then using the resulting information to improve your overall site experience.

Is Google using this type of info as a ranking factor?

At some level, they clearly are. Clearly Google and Bing have evolved far beyond the initial TF-IDF concepts, but we can still use them to better understand our own content.

The first major indication we had that Google was performing content quality analysis was with the release of the Panda algorithm in February of 2011. More recently, we know that on April 21 Google will release an algorithm that makes the mobile friendliness of a web site a ranking factor. Pure and simple, this algo is about the user experience with a page.

Exactly how Google is performing these measurements is not known, but what we do know is their intent. They want to make their search engine look good, largely because it helps them make more money. Sending users to pages that make them happy will do that. Google has every incentive to improve the quality of their search results in as many ways as they can.

Ultimately, we don't actually know what Google is measuring and using. It may be that the only SEO impact of providing pages that satisfy a very high percentage of users is an indirect one. I.e., so many people like your site that it gets written about more, linked to more, has tons of social shares, gets great engagement, that Google sees other signals that it uses as ranking factors, and this is why your rankings improve.

But, do I care if the impact is a direct one or an indirect one? Well, NO.

Using TF analysis to evaluate your page

TF-IDF analysis is more about relevance than content quality, but we can still use various precepts from it to help us understand our own content quality. One way to do this is to compare the results of a TF analysis of all the keywords on your page with those pages that currently outrank you in the search results. In this section, I am going to outline the basic concepts for how you can do this. In the next section I will show you a process that you can use with publicly available tools and a spreadsheet.

The simplest form of TF analysis is to count the number of uses of each keyword on a page. However, the problem with that is that a page using a keyword 10 times will be seen as 10 times more valuable than a page that uses a keyword only once. For that reason, we dampen the calculations. I have seen two methods for doing this, as follows:

term frequency calculation

The first method relies on dividing the number of repetitions of a keyword by the count for the most popular word on the entire page. Basically, what this does is eliminate the inherent advantage that longer documents might otherwise have over shorter ones. The second method dampens the total impact in a different way, by taking the log base 10 for the actual keyword count. Both of these achieve the effect of still valuing incremental uses of a keyword, but dampening it substantially. I prefer to use method 1, but you can use either method for our purposes here.

Once you have the TF calculated for every different keyword found on your page, you can then start to do the same analysis for pages that outrank you for a given search term. If you were to do this for five competing pages, the result might look something like this:

term frequency spreadsheet

I will show you how to set up the spreadsheet later, but for now, let's do the fun part, which is to figure out how to analyze the results. Here are some of the things to look for:

  1. Are there any highly related words that all or most of your competitors are using that you don't use at all?
  2. Are there any such words that you use significantly less, on average, than your competitors?
  3. Also look for words that you use significantly more than competitors.

You can then tag these words for further analysis. Once you are done, your spreadsheet may now look like this:

second stage term frequency analysis spreadsheet

In order to make this fit into this screen shot above and keep it legibly, I eliminated some columns you saw in my first spreadsheet. However, I did a sample analysis for the movie "Woman in Gold". You can see the full spreadsheet of calculations here. Note that we used an automated approach to marking some items at "Low Ratio," "High Ratio," or "All Competitors Have, Client Does Not."

None of these flags by themselves have meaning, so you now need to put all of this into context. In our example, the following words probably have no significance at all: "get", "you", "top", "see", "we", "all", "but", and other words of this type. These are just very basic English language words.

But, we can see other things of note relating to the target page (a.k.a. the client page):

  1. It's missing any mention of actor ryan reynolds
  2. It's missing any mention of actor helen mirren
  3. The page has no reviews
  4. Words like "family" and "story" are not mentioned
  5. "Austrian" and "maria altmann" are not used at all
  6. The phrase "woman in gold" and words "billing" and "info" are used proportionally more than they are with the other pages

Note that the last item is only visible if you open the spreadsheet. The issues above could well be significant, as the lead actors, reviews, and other indications that the page has in-depth content. We see that competing pages that rank have details of the story, so that's an indication that this is what Google (and users) are looking for. The fact that the main key phrase, and the word "billing", are used to a proportionally high degree also makes it seem a bit spammy.

In fact, if you look at the information closely, you can see that the target page is quite thin in overall content. So much so, that it almost looks like a doorway page. In fact, it looks like it was put together by the movie studio itself, just not very well, as it presents little in the way of a home page experience that would cause it to rank for the name of the movie!

In the many different times I have done an analysis using these methods, I've been able to make many different types of observations about pages. A few of the more interesting ones include:

  1. A page that had no privacy policy, yet was taking personally identifiable info from users.
  2. A major lack of important synonyms that would indicate a real depth of available content.
  3. Comparatively low Domain Authority competitors ranking with in-depth content.

These types of observations are interesting and valuable, but it's important to stress that you shouldn't be overly mechanical about this. The value in this type of analysis is that it gives you a technical way to compare the content on your page with that of your competitors. This type of analysis should be used in combination with other methods that you use for evaluating that same page. I'll address this some more in the summary section of this below.

How do you execute this for yourself?

The full spreadsheet contains all the formulas so all you need to do is link in the keyword count data. I have tried this with two different keyword density tools, the one from Searchmetrics, and this one from motoricerca.info.

I am not endorsing these tools, and I have no financial interest in either one—they just seemed to work fairly well for the process I outlined above. To provide the data in the right format, please do the following:

  1. Run all the URLs you are testing through the keyword density tool.
  2. Copy and paste all the one word, two word, and three word results into a tab on the spreadsheet.
  3. Sort them all so you get total word counts aligned by position as I have shown in the linked spreadsheet.
  4. Set up the formulas as I did in the demo spreadsheet (you can just use the demo spreadsheet).
  5. Then do your analysis!

This may sound a bit tedious (and it is), but it has worked very well for us at STC.

Summary

You can also use usability groups and a number of other methods to figure out what users are really looking for on your site. However, what this does is give us a look at what Google has chosen to rank the highest in its search results. Don't treat this as some sort of magic formula where you mechanically tweak the content to get better metrics in this analysis.

Instead, use this as a method for slicing into your content to better see it the way a machine might see it. It can yield some surprising (and wonderful) insights!


Sign up for The Moz Top 10, a semimonthly mailer updating you on the top ten hottest pieces of SEO news, tips, and rad links uncovered by the Moz team. Think of it as your exclusive digest of stuff you don't have time to hunt down but want to read!

You are subscribed to the newsletter of Moz Blog sent from 1100 Second Avenue, Seattle, WA 98101 United States
To stop receiving those e-mails, you can unsubscribe now.
Newsletter powered by FeedPress
FeedPress is a service edited by Beta&Cie, www.betacie.com

Seth's Blog : Are you feeling lucky?

Are you feeling lucky?

Expected value is a powerful concept, easy to understand, often difficult to use in daily life.

It's the value of an outcome multiplied by the chances it will happen.

If there's a one in ten chance you'll get a $50 ticket for parking here, the expected value (the cost) of parking here is $5. Park here enough times, and that's what it's going to cost you.

If there's a one in five chance you'll win that lawsuit for a million dollars, the expected value of the suit is $200,000.

That's not a guess or a vague hunch, it's actually true. If the odds are described properly (and setting those odds is an entirely different discussion) then the value of the opportunity (or the cost of it) is clear.

And yet...

And yet we anchor our risks, often overestimating just how much it's going to cost us to get a ticket.

And we anchor our possible gains, usually overestimating how much that opportunity is worth (which is why so few lawsuits that should settle, do).

Humans are quite bad at dealing with ambiguity, and even worse when there's money on the table. Ellsberg's paradox helps us understand some of the bugs in the system, and perhaps we can take better risks by using a pencil, not our gut, to decide what a chance is worth.

       

More Recent Articles

[You're getting this note because you subscribed to Seth Godin's blog.]

Don't want to get this email anymore? Click the link below to unsubscribe.



Email subscriptions powered by FeedBlitz, LLC, 365 Boston Post Rd, Suite 123, Sudbury, MA 01776, USA.

marți, 14 aprilie 2015

Mish's Global Economic Trend Analysis

Mish's Global Economic Trend Analysis


"Inconceivable" Negative Interest Rates on Mortgages in Portugal and Spain, with Italy On Deck

Posted: 14 Apr 2015 11:22 PM PDT

The vast majority of mortgages in Portugal, and a huge number in Italy and Spain are tied to Euribor, the rate it costs European banks to borrow from each other.

If Euribor drops low, enough banks will have to pay borrowers. It has already happened in Spain.

The WAll Street Journal reports Tumbling Interest Rates in Europe Leaves Some Banks Owing Money on Loans to Borrowers.
Tumbling interest rates in Europe have put some banks in an inconceivable position: owing money on loans to borrowers.

At least one Spanish bank, Bankinter SA, the country's seventh-largest lender by market value, has been paying some customers interest on mortgages by deducting that amount from the principal the borrower owes.

Interest rates have been falling sharply, in some cases into negative territory, since the European Central Bank last year introduced measures meant to spur the economy in the eurozone, including cutting its own deposit rate. The ECB in March also launched a bond-buying program, driving down yields on eurozone debt in hopes of fostering lending.

In countries such as Spain, Portugal and Italy, the base interest rate used for many loans, especially mortgages, is the euro interbank offered rate, or Euribor. The rate is based on how much it costs European banks to borrow from each other.

Portugal's central bank recently ruled that banks would have to pay interest on existing loans if Euribor plus any additional spread falls below zero. The central bank, however, said lenders are free to take "precautionary measures" in future contracts. More than 90% of the 2.3 million mortgages outstanding in Portugal have variable rates linked to Euribor.

In Spain, a spokesman for the central bank said it is studying the issue. Bankers in Italy said they are awaiting guidance from their local banking association, because loan contracts don't include any clause on what happens if benchmark rates go below zero.

In Spain, Bankinter has been forced to deduct some clients' mortgage principal payments because an interest-rate benchmark tied to Switzerland's currency has dipped into negative territory.

An executive at another Spanish bank said the lender in recent months has started to put in place an interest-rate floor on thousands of short-term business loans that are tied to short-term variations of Euribor. Two-month Euribor, is at minus 0.004%. For new loans, the bank is increasing the cushion it charges customers above Euribor.

Hundreds of thousands of additional loans would be affected if medium-term Euribor rates enter negative territory, the executive said. The six-month rate is currently at 0.078%.

In Portugal, interest rates on most mortgages are linked to a monthly average of three- and six-month Euribor. Both have been steadily sinking and are hovering just above zero.
Inconceivable Payback



Reflections on the Inconceivable



link if video does not play: Princess Bride.

Mike "Mish" Shedlock
http://globaleconomicanalysis.blogspot.com

France Considers Forcing Google to Disclose Search Algorithm; Too Much Satisfaction!

Posted: 14 Apr 2015 01:18 PM PDT

Too Much Satisfaction!

Heaven forbid consumers actually like something too much.

If they do, they buy it or use it more than they buy or use competing products.  And when that happens, well it must be "unfair" competition.

Lord knows we cannot possibly tolerate too much consumer satisfaction.

So, with that line of thinking Europe to Accuse Google of Illegally Abusing its Dominance.
Google will on Wednesday be accused by Brussels of illegally abusing its dominance of search in Europe, a step that ultimately could force it to change its business model fundamentally and pay hefty fines.

Margrethe Vestager, the EU's competition commissioner, is to say that the US group will soon be served with a formal charge sheet alleging that it breached antitrust rules by diverting traffic from rivals to favour its own services, according to two people familiar with the case.

A decision on charges is to be taken by the college of 28 EU commissioners on Wednesday. Some commissioners are concerned that Ms Vestager has, according to one source, restructured and narrowed the case she inherited from her predecessor Joaquín Almunia. As well as search issues, the investigation has looked at allegations that Google illegally scrapes content from rivals, locks some publishers into using Google search ads, and makes it hard for advertisers to move campaigns to rival search engines.

Almost 20 complainants against Google want the search engine to abide by strict rules that ensure its formula treats its own services — providing results for travel, shopping and maps — no differently from rivals. Google and the commission declined to comment.

On top of the pressure from Brussels, this week Google is also under scrutiny in France where lawmakers are considering an initiative that would force it to hand over its secret formula for ranking websites.

Google supporters feel the commission's volte-face on a settlement reflected politics rather than an independent assessment. No EU antitrust case has ever been extended to three settlement offers, or been revived after complainants were formally warned that their case was about to be rejected.

On top of the pressure from Brussels, this week Google is also under scrutiny in France where lawmakers are considering an initiative that would force it to hand over its secret formula for ranking websites.

The French senate is likely to adopt a bill this week which would allow the country's national telecoms regulator to monitor search engines' algorithms, with sweeping powers to ensure its results are fair and non-discriminatory.

If approved, the proposal would give Arcep, France's telecoms regulator, powers to scrutinise any search engine that had sufficient power to "structure the functioning of the digital economy". Google would be required to provide links to at least three rival search engines on its homepage, and disclose to users the "general principles of ranking".
Requiring Google to disclose its algorithms is tantamount to requiring Google give away its trade secrets and patents for free.

Requiring Google to list other search engines is like requiring Ford dealerships to sell GM autos.

Search Engine Choices 

People can choose from any number of search engines. Here are the Top 15 Search Engines.

I show a selection below. 

  • Bing
  • DuckDuckGo
  • Dogpile
  • LXQuick
  • Yahoo
  • ASK
  • AOL
  • WOW

No Tracking

DuckDuckGo bills itself as the "Search Engine That Does Not Track You".

No tracking is an important issue to some people, not others. If the issue becomes important enough, Google will have to change its model or it will lose traffic to DuckDuckGo.

That is how change should happen, not by EU witch hunts.

By the way, I do not believe Google locks any publishers into using Google search ads. If someone wants to use non-google ads, they are free to do so.

If the results are not as good, well, maybe the higher price of Google ads is worth it.

Why Do People Use Google Search?

Search users use Google for a simple reason: They like it. It does not matter why.

In the eyes of the EU, Google provides too much satisfaction. And the EU will not allow that!

Mike "Mish" Shedlock
http://globaleconomicanalysis.blogspot.com

Experts Confounded: Retail Sales Rise First Time in Four Months, But Weaker Than Expected

Posted: 14 Apr 2015 10:45 AM PDT

The Bloomberg retail sales consensus estimate was for a 1.1% gain. Sales did rise for the first time in four months, but not as much as expected.
Retail sales in March rebounded 0.9 percent after dropping 0.5 percent in February. The market consensus for March was for a 1.1 percent boost. Excluding autos, sales gained 0.4 percent, following no change in February. Expectations were for a 0.6 percent increase. Gasoline sales dipped 0.6 percent after 2.3 percent increase in February. Excluding both autos and gasoline sales rebounded 0.5 percent after declining 0.3 percent in February. Expectations were for a 0.4 percent increase.
Experts Confounded

Please consider U.S. Retail Sales Rise for First Time in Four Months.
U.S. retail sales rose for the first time in four months in March, but the gain wasn't enough to offset weaker spending during the winter months as consumers continued to largely pocket savings from cheaper gasoline prices.

"This outcome confounds all the standard consumer-spending models," J.P. Morgan chief U.S. economist Michael Feroli said in a note to clients. "Job gains, wealth gains, low gas prices and very high consumer sentiment would all point to solid consumer spending increases."

"The rebound we had been waiting for was rather soft and disappointing," Laura Rosner, an economist at BNP Paribas, said in a note to clients.

Paying less at the pump should free up money for U.S. consumers to spend elsewhere. But many are socking that money away, or using it to pay down debt.
Possible Explanations

  1. Consumers did not realize how much the weather has changed for the better.
  2. The standard models  are bogus.
  3. Reports of strong jobs are more myth than reality.

I opt for a combination of two and three.

Mike "Mish" Shedlock
http://globaleconomicanalysis.blogspot.com

Spotlight on China: Margin Debt, Trading Accounts, Construction Equipment

Posted: 14 Apr 2015 01:25 AM PDT

In response to my April 1, post China Margin Debt Soars to Record 1 Trillion Yuan; Another Central Bank Sponsored Bubble I received an email from reader Nicolas.

He writes ...
Hello Mish

Happy Monday. I find your output excellent an I hope that you are flattered that you are followed by private banks is Switzerland.

Quick question on your last note; please can you tell me what (Bloomberg/Reuters) code you use for Chinese Margin debt? i.e. where can I cross-reference the Trillion Yuan figure you quote?

Best regards and many thanks,

Nicolas
I certainly was unaware I was followed by banks in Switzerland. Thanks!

The Bloomberg data is from SSE Margin, in Chinese. I asked my friend Chris Puplava at Financial Sense if he was aware of a Bloomberg tracking symbol. We do not believe there is such a symbol for margin.

However, Chris did locate this interesting chart of the Shanghai stock market vs. new accounts that is available on Bloomberg.

Shanghai Stock Index vs. New Accounts



click on chart for sharper image

I get lots of data from readers, and I appreciate it! In regards to China, reader Norman writes ...
Hello Mish,

Thanks for your "straight talk" on important issues impacting our financial lives. You recently sent information concerning China's GDP. A good indicator of growth is found in sales of construction equipment. Construction equipment manufacturer Komatsu lists its equipment orders by location. The numbers speak for themselves concerning growth and China. Keep up the good work!

Norman
Komatsu Orders



click on chart for sharper image

Komatsu is just a single manufacturer. It may not be representative of all such activity and orders. But given the collapse in commodity prices such as iron ore, I suspect it is. If so, this segment of the Chinese economy looks like a disaster.

Those expecting a rebound in Chinese housing or construction are likely mistaken. The new game in town is clearly stock market speculation.

Chinese Growth

My post Reality Check: How Fast is China Growing? Global Recession at Hand is also consistent with the China rapid slowdown thesis.

Mike "Mish" Shedlock
http://globaleconomicanalysis.blogspot.com

Damn Cool Pics

Damn Cool Pics


25 Things You Loved As A Kid But Can't Stand As An Adult

Posted: 14 Apr 2015 11:56 AM PDT

Growing up just sucks the fun out of everything.

















15 Years Ago This Is What The MTV Movie Awards Looked Like

Posted: 14 Apr 2015 11:42 AM PDT

The red carpet was a very different place 15 years ago.