Posted by adamf
Last week I was fortunate to have the opportunity to attend the Big Boulder conference put on by GNIP in Boulder, Colorado. Around 200 attendees joined together in the St. Julien Hotel to attend what was billed as the world's first social media data conference. GNIP, a company focused on aggregating and providing social data, assembled a great lineup of speakers, the majority of whom provide or consume social data. I enjoyed the conference and took the opportunity to learn more about social data that is now available to consumers, hear about the creative ways people are using such data, meet some great minds in the social media space, and enjoy the great food, beautiful scenery, sunny weather, and friendly vibes that are hallmarks of Boulder.
The Big Boulder presentations were mostly in a Q&A format with a few panels mixed in. I've pulled together some of the key themes that I observed from the sessions to share in this post. GNIP also posted some detailed summaries of the presentations on their blog. If anyone who attended or presented at the conference sees anything that needs to be added, please let me know!
Major social networks continue to invest in sharing their rich data to third parties, though in a measured way
Folks from Twitter, LinkedIn, and Facebook all presented at the conference in some form. All noted that they are continuing to develop their APIs to provide better access to their huge data sets, though there weren't any big announcement of specific new capabilities.
Twitter confirmed its intention to continue sharing data from their firehose through a limited set of partners in all but a small number of cases where customers have unique needs. Their rationale for limiting access is that the vast quantity of data that comes through the firehose is too expensive for most companies to consume, store, and parse. It makes more sense for them to work through third parties who can help provide only the most relevant data to businesses looking for that data. They were asked whether they would be providing a location-based API, but they have no short-term plans to do so. They say that currently only 4% of Tweets are geocoded.
Facebook provides API access to a lot of valuable data, and states that they want to provide realtime insights to businesses. Howeve, they do not yet offer a firehose like Twitter. When asked about making more data from their social graph available, they noted that the biggest challenges they face are with managing privacy and in balancing syndication with standardization. If the language and methods for measurement are all different between sources, it gets hard for everyone to understand what the data means. Facebook said that they are looking for ideas from the community for new data to offer and some great use cases to show how it would be valuable.
LinkedIn didn't talk much about their API specifically, but presented interesting insights and use cases about how they are using that data to drive their business forward. I'll share a few of these insights later on.
Beyond traditional social media, we are now seeing other sources of social data become available
A good percentage of the presentations at Big Boulder were from companies that are providing data that is social, but perhaps not from the traditional companies that come to mind when you think of social media data. Some were blog platforms, like Wordpress.com and Tumblr. Others were blog commenting platforms like Disqus, or forum-based like Vanilla Forums and Get Satisfaction. Formspring provides a forum for general discussions around specific questions, and GetGlue is a rich community and check-in service centered on TV, movies, and books. StockTwits was another, which curates stock data from Twitter and provides a layer of social information to inform investors beyond just the traditional data.
I was seriously impressed by the volume and variety of interesting data that is being collected, curated, and shared via APIs.
One of the biggest challenges with social data tools and platforms is in providing actionable insights
A fascinating panel discussion brought together Zach Hofer-Shall (Forrester Research), Susan Etlinger (Altimeter Group), Nathan Gilliatt (Social Target), and Shawn Rogers (Enterprise Management Associates) to discuss emerging trends in analyzing social data. They discussed the challenges of integrating social data into enterprise organizations.
The four panelists were in agreement on the majority of topics disucssed. The first is that social media data is most frequently brought into organizations by PR folks, but hardly used to its full extent, as PR is usually not focused on detailed quantitative analysis. It's used more at surface level to catch and engage with positive and negative press and to do "damage control".
Another barrier discussed is that social data has not yet been integrated with BI teams at enterprises where the focus is mostly web analytics. At the enterprise level, there are challenges not just with siloed organizations, but siloed data. There a lots of different roles that benefit from social data insights and each has a different context. Integrating that data and offering it in ways that provide the most useful insights to those who need it is a challenge that plenty of companies have yet to surmount.
The panelists also agreed that one of the biggest challenges with getting value out of social media software is not necessarily with analytics to understand how the data relates to a business, but rather in existing software's abilities to provide actionable insights. Though there are some pieces of this out there, the panel saw a lot of opportunity for social analytics software to really step up in this space.
You can learn all sorts of interesting stuff from the enormous (and growing) public data set out there
There were a few insights that were shared throughout presentations, including:
From Martin Remy at Automatic (which runs Wordpress.com)
- Fashion bloggers are the most "chatty." Baseball, religion, and politics are also high on the chatty list. At the bottom of the list, interestingly, are advertising and social media. Tech falls in the middle
- Turkish speakers are most conversational, followed closely by Hebrew. Japanese and Korean speakers are some of the least chatty.
From Yael Garten, Data Scientist at LinkedIn
- Yael shared a chart of growing vs. shrinking industries. Unsurprisingly, the newspapers industry shrinking most. The renewable industry was the fastest growing (by a bigger margin than I would have guessed).
- She shared a test of a hypotheses that people would be leaving the banking industry after the financial crisis. This turned out not to be the case. Lots of people left the banks at that time, but mostly ended up moving to different banks rather than finding new fields.
- We also learned that male CEO names tended to be short (or shortened), approachable, 1-sylable names, while female CEOs tended to have more classic names multi-syllable names. Short names were even more popular in sales, with names like Chip or Trey.
Spam in social data is a big focus for social media data sources
Spam in social data has become even more of an issue with people paying for data feeds. For example, if you are paying by the tweet for your Twitter data, you don't want to pay for the spammy tweets.
Given that notion, Twitter is working hard on combatting spam and noted that their spam team has grown to be one of the biggest teams at the company. Twitter combats spam through initial filtering, which would keep it out of the firehose feed, though they can't catch everything that way without risking pulling legitimate content, too. Therefore, some spam gets through the system initially and is pulled after the fact.
Ken Little, Director of Engineering for Tumblr, also talked about spam being a big priority. They try and shut down spammy looking registrations right out of the gate. Tumblr uses a simple 3rd party content analysis to identify some spam and have been are developing an in-house system that identifies spammy accounts based on behavioral analysis. Some people just don't use their accounts in ways consistent with your average human.
Verification of the sources of social content is a big problem yet to be solved, and critical for social media data use in public service
For most businesses, verification of the content and source of social data is important, but not critical. In one panel discussion, however, we learned about some fascinating uses of social data for the public interest that are providing insights, but struggle with the verification of social data sources.
Moeed Ahmad, Head of New Media at Al Jazeera Media Network, talked a fair bit about the challenges and need for data verification, especially for his news organization. Al Jazeera's charter is to provide voices to the voiceless. This proves a challenge in the regions they traditionally report on most, where media is generally state-run. Social media has proven to be a great way to hear directly from the the people and surface amazing stories and viewpoints that might never have been heard otherwise. The challenge with this, as with in any reporting, is to ensure that the information shared is correct and can be verified. To try and manage this, Al Jazeera Media added procedures to the usage of social data in reporting. This change was largely an issue of verifying as much as possible and setting context in the report itself. Conversely, Moeed talked also about using social channels for verification of information. He spoke of a particularly gruesome video that had been sent to Al Jazeera showing what was reported to be a recent atrocity. Unable to verify the video and story, he posted it on Twitter and quickly found out that the video was 3 years old, and was actually from a totally different country than suggested by the person who had submitted it.
Katie Baucom, Geospatial Analyst at the Geospatial Intelligence Agency, has been working to use social data to aid in disaster response and in assessing the damage done by natural disasters. Her organization is traditionally focused only on satellite imagery, but can receive social data and photos far faster than the satellite images can be processed. Their process seems like an incredibly powerful use of social data, but verification of authenticity and location is critical in their context. They need to ensure that they are providing accurate data to ensure that disaster response is applied first where it is needed most.
Rumi Chunara, instructor at Harvard Medical School, works on a project called Healthmap, which seeks to discover and track the spread of infectious diseases in real-time using as many data sources as possible. Her challenge in this is determining fact from rumor in Tweets. To help solve this issue, they've been comparing information from social sources against trusted data from doctors on the ground. Her team is hoping to compare where the differences lie and model how they might be more accurately predictive using the social data. She also noted that Google Trends has been a great tool in finding outbreaks. For instance, when the flu hits an area, the searches for flu symptoms in certain geos increases noticeably.
Light engagements may be the key to building further engagment amongst lurkers
We heard this topic discussed from a good number of people in the blog, forum, and commenting space. One panel focused specifically on how people engage online. A rule of thumb that was generally substantiated was the 90-9-1 rule. 90% of blog or forum readers are passive lurkers, 9% engage lightly, and 1% of the readers create most of the content. Everyone in the panel seemed to agree that making light social engagements, such as likes, or thumbs up, or even smiley faces easy to engage with is the path toward starting to draw in more of that 90%. One of the reasons this is so important, beyond further engaging the readership, is that it further engages the bloggers, who in turn write more content.
If you are looking to build an entrepreneurial community, there are four principles to keep in mind
Brad Feld of Foundry Group led off day 2 of the conference with a quick talk about why Boulder has a thriving startup scene and shared his thoughts about principles to build an entrepreneurial community. His four points for a successful community are:
- The community must be lead by entrepreneurs.
- The community must take a long-term view. There will be good and bad times for business and entrepreneurship, so you need to take a look at the last 20 years to see the bigger future trends.
- Your community must be inclusive of anyone that wants to engage in the startup community in any way.
- There must be institutions that engage the entire stack of the startup community. TechStars and Startup Weekend are two great examples of this as they help bring mentorship, investors, and teachers together with entrepreneurs.
Boulder, Colorado is a pretty cool town
Beyond putting on a great conference and pulling together some fascinating people, GNIP sought to show off their hometown of Boulder, Colorado. They definitely succeeded, starting with the conference hotel. We stayed at the St. Julien Hotel and Spa right off of Pearl Street where we enjoyed the great food and drink in the walkable neighborhood, and were amazed at the throngs of students, profs, techies, families, and aging hippies all hanging out late on a school night.
To show how outdoorsy Boulder is, GNIP lined up a bunch of 'healthy' events for us to enjoy while there. Unfortunately I missed the morning yoga sessions at 6AM and a hike planned out at 6:30AM as sleep won out. However, Jamie and I made it out for the biking pub crawl after the close of the conference. GNIP sprung for a bunch of bikes from a local rental service that allows you to pickup bikes from standing bike racks and return them to any of their racks around the city when you are done. We all converged on a row of bikes just outside the hotel and cruised around the area, looking cool with our bright red bikes with big baskets on front. After a few pints and biking in 90+ degree heat, we were all feeling fine. A fun and unique way to close out a fun and uniqe conference.
Sign up for The Moz Top 10, a semimonthly mailer updating you on the top ten hottest pieces of SEO news, tips, and rad links uncovered by the Moz team. Think of it as your exclusive digest of stuff you don't have time to hunt down but want to read!