Thursday 26 April 2012

Scaling Data to Make Better Decisions

This week we have been attending a series of events held during Big Data Week. At the London Community Event, we heard from a panel discussion on big data that included (amongst others) Hilary Mason, Chief Scientist of Bit.ly, Doug Cutting, co-founder of the Apache Hadoop project, and Nick Halstead, CTO and founder of Datasift. At the same time of Big Data Week, the gaming industry have been attending GiGse 2012 in San Fransisco. During the CEO panel at GiGse, the importance of data was highlighted by Jim Ryan, co-CEO of bwin.party, who said that bwin.party now have around 70 people in their business information team analysing data and feeding back into its marketing operation. This is a big team of data analysts - bwin.party are taking data very seriously. However whilst large organisations have the resources to deploy very large analytics teams to solve big data problems, we believe that as data volumes continue to increase exponentially, adding solely staff to process and mine increasingly large data volumes will not be a scalable solution for any organisation in any industry. This view was confirmed by the expert panel at Big Data Community Event – here is a summary of some of the key discussions themes.

How important is the ‘Big’ in Big Data?
A philosophical explanation of big data centred on being able to look at data with no pre-conceived ideas. As well as an open mind, big data is about having the ability to join multiple data sets and run analytics across them, rather than taking a silo approach. Whilst the panel disagreed on the relative importance of the word ‘big’ in big data, a recurrent message was that today it’s much easier and cheaper to store and analyse very large data sets e.g. large scale data processing (i.e. map reduce) on platforms such as Amazon Web Services (AWS) has now become commoditized, and are now considered established technologies and platforms. And with the profileration of eCommerce, social media and APIs, there is a lot more volume and richness of data available to analyse today. If you are interested in reading an example as to how the combination of cloud computing (e.g. AWS) and Hadoop (map reduce) enables big data processing at scale and at a significantly reduced cost then we suggest you read this article - Big Pain or Big Profits?

What are some of the Challenges?
Arguably the biggest challenge is how do organisations find the important nuggets of information? Taking a 'boil the ocean' approach to big data is fraught with challenges, a theme we have examined in our blog on Data Inflation last month. It was stated that one of the major benefits of big data is the ability to get answers to questions back quickly, which in the past could take weeks and months - but this needs access to an increasingly important resource - the data scientist - a combination of math, computing, and domain expertise, coupled with an open and inquisitive mind. And finding these people is a major headache for most organisations. There was also discussion about academic access and use of data. It was argued commercial organisations still remain very reluctant to share information via open research projects as there is a lack of trust as to how these data sets will be used and by whom (an example of an open research project within the gaming industry is The Transparency Project). A key infrastructure challenge is that the internet has not been designed to process large data sets at low latencies. Ensuring compliance with data privacy requirements, unsurprisingly, remains a major focal point.

Should You Care?
If you want to make better decisions then the answer is yes! The end product of any data or big data project has to be focused on better decision making. In gaming this equates to supporting decision making across aspects of the business: game design, game performance, 1-2-1 marketing and consumer protection, and finance and risk management. So whilst we can argue about the importance of the word 'Big' in big data, we cannot argue about the increasing relevance of data to managing our businesses today. As the panel concluded, "we are just scratching the surface of what is possible".