Big Data's Growing Pains
What is ‘Big Data’? The words ‘big data’ seem to be everywhere these days. Detailed, and valuable, personal information is generated from just about everything in our daily lives, and is utilized by our social media sites and marketers to target products, information and services that our behaviour suggests that we’ll ‘like’. While many people refer to big data as purely digital inputs, like online and social network behaviour, in actual fact most companies include traditional data transactions and records, such as point-of-sale interactions, in their analytics and persona development too. The use of Big Data is credited as one of the main factors that drive intelligent business projects, and business intelligence, today.
‘Predictive analytics’ is another term thrown around a lot which acts as an enabler of big data. In the business world, companies collect mass amounts of real-time data from consumers and then use this data, combined with customer feedback, to model and anticipate future events and action using predictive analytics. This allows businesses to move forward taking the consumer as the main focus point in developing products.
Impact of Big Data on Predictions?
Big data, when analysed properly, can be instrumental in making predictions in a wide variety of fields. Companies and individuals alike can benefit from its use in predicting trends, better understanding consumer behaviour, process optimization, and improving performance. Every time you use your Fitbit, 22seven app, or even Google Maps you’re using the power of big data to optimize your training, spending and trips.
Perhaps the most prominent public use of big data analytics today, however, is in the prediction of political elections or referendums, most notably this year’s Brexit and US presidential elections. Technology and big data analytics have become an essential part of the election process. Parties use data-driven observations and social media trends to stay ahead of, and respond to, the opinions of the masses.
In 2012 Obama’s campaign team embraced this ‘new’ trend completely using social media, television, phone calls and the internet to focus in on, and feed, targeted voters unique and compelling content. Their 2012 campaign’s statistical analysis team was nearly 5 times larger than the number of statisticians and data analysts included in the 2008 campaign, indicating the massive increase in the role that big data plays in politics. While Obama may have been an early adopter, since then it has only grown in popularity, and capacity.
So What Went Wrong in 2016?
If data analytics has come so far, why is that predictions this year were so horribly wrong? Almost all of the major forecasters predicted a Clinton victory, with her average prospect of success ranging between 70% and a staggering 99%!
These forecasters relied heavily on opinion polls, drawing data from social media posts and public surveys through textual analytics and the like. So how could they have been so wrong about what the American people would actually do on voting day? Because so much of the vigorous debate between the two divided parties, and their supporters, took place online in blogs, tweets and status updates, there should have been a vast pool of information from which to draw inferences regarding people’s expected voting behaviour. Why was this not the case?
Some speculate that many Trump voters were reluctant to publically admit to supporting such a controversial candidate, and so their opinions were simply not out there to be considered in polling data.
Other than being ashamed of their opinions, many people may simply have just not wanted to participate in on-call, or in person, opinion polls and surveys. In fact many of the response rates of the some of the most ‘reputable’ polls indicate this to be the case. Would you open up to the ‘telemarketer’ calling you for the 3rd time this month about your political leanings? Now consider the controversy surrounding each candidate…
Furthermore, studies - and the election outcomes themselves - suggest that many have felt somewhat sidelined and disenfranchised by the establishment, by the previous Democratic and Tory governments, and by voting processes in their entirety. They also feel and disconnected from the media frenzy surrounding it. When you’re relying on voluntary participation in a society riddled with those who don’t support the systems, or who are at least thrown into uncertainty about their veracity, it’s not really all that surprising that you’re unable to accurately capture their sentiments.
Another factor to consider is that the data collected online - through blog articles, social media, textual analytics and the like - is notably skewed to reflect the sentiments of the populations most represented – and actively engaging - on social media. This will likely mean that certain voting populations – such as ‘older’ and lower income Britons and Americans, for example - are not accurately represented purely because they aren’t on social media, or regularly sharing their opinions on these platforms.
There is also an element of uncertainty at play. In July, right after Hillary’s very public email scandal, polls predicted that if voting took place on that day it would have gone the way of Trump. Two weeks later, after some or other provocative statement by Trump, polls indicated a Clinton victory. This fluctuation continued with each new controversial development making accurate predictions difficult to establish and publish. It became almost as volatile as the markets – a gamble.
Another possibility might be that many of those who would have voted Clinton, or at least preferred Hillary Clinton to President Trump, were tired of the political drama of the entire election, weren’t that enamored with either, and thus weren’t rallying their friends, even opting not to vote at all. While they may have supported her in opinion polls, they didn’t show up on voting day. The 2016 US Presidential election marked the lowest voter turn out in US election history since 1996. The lack of voters may thus also have significantly skewed the results.
An important point to consider is how data predictions are represented. Nate Silver, of FiveThirtyEight, famously tweeted an example of how easily data can be twisted, manipulated and misinterpreted when he posted the following mid-November.
FiveThirtyEight is a website that focuses on opinion poll analysis, using the data it collects on social media and other public platforms to generate predictions for the outcomes of certain votes and referendums. It is one of the now numerous sites looking over the 2016 opinion polls and trying to figure out how things went so wrong in the world of predictive analytics.
This post by Silver demonstrates how easily manipulated and misinterpreted the world of statistics can be and how, no matter what you say, there will always be people who question your predictions, their value and method. With this in mind, it is important to evaluate the future of analytics in political predictions, and what weight we give to the opinion poll based predictions that set up our expectations in the build up to major political events. It will be interesting to see this field develop and improve as we move towards the 2020 elections, and other political referendums, that lie between then and now.
These upsets to predictive models and the world of big data have some sizeable implications for the field at large – often taken, until now, as the ‘holy grail’ and undeniable ‘truth’. Due to the seismic role that big data plays in business too, and indeed in our day-to-day lives because of this, it is vital that all models, methods and analysis are revisited as we continue the iterative development of what it is still a relatively new field. Like it or not, flawed though it can be, big data and the role that it plays in our everyday experiences is only set to grow.
This is one of the fundamental reasons why we, at Gravity, always couch our analysis of the quantitative in the qualitative. While data is invaluable in terms of signaling larger trends, in creating (clearly occasionally flawed) predictive models, and in reflecting actual rather than reported behavior (far fewer people would admit to shoplifting, for example, than the numbers suggest), it needs further insight, substantiation and evidence. This is why analyzing academic research, case studies and conducting user insights sessions, for example, are such a fundamental and interconnected part of all of the work that we do too.
Considering its rapid evolution, it’s hard to anticipate what the world of big data will look like in 10, 15, or even 5 years’ time. The methods, sources and uses of this wealth of information are evolving on a daily basis, so keep an eye out for how they catch your eye using your Big Data.