Cristina Ferrero Castaño
Debunking myths: truths and misconceptions about Big Data
Big data: its analysis can help us predict who will score the winning goal, when the next pandemic will strike or when the next snowfall of the century will occur... or can it?
September 28, 2021
We've all heard of big data, those huge datasets (on everything from weather to shopping trends, traffic flows to video viewing, among many, many other topics) floating in hyperspace that can only be handled by computer applications and ideally by qualified data professionals. Their analysis can help us predict who will score the winning goal, when the next pandemic will appear or when the next snowfall of the century will occur... or can it?
We often read that big data is going to change the world, and that is true, but sometimes wishful thinking is mistaken for reality. Despite the enormous versatility of data-driven solutions, there are also beliefs or myths about this discipline that are repeated and unfounded. Javier García Algarra, academic director of the Engineering and Science area at the U-tad University Centre, identifies (and debunks) the seven most widespread big data myths of recent times:
Lots of data = big data.
Not always, we may be collecting a huge volume of information but it may not be useful for the problem we want to solve because it is not representative. For example, if we have access to millions of medical records in Spain, that data will not allow us to make correct predictions about a patient's chances of developing diabetes or melanoma in the Philippines or Guatemala, even if the system is very accurate for the Spanish population. Nor can we deduce the musical tastes of the over-50s by studying the playlists of high school students.
2. With big data we can predict any phenomenon
For this to be true, it is essential that what we are studying is not completely random. We cannot predict which number will be the winning number in the Christmas lottery draw, even if we know the results of the previous 100 years, because it is related to pure chance. On the other hand, we can estimate that football team A will beat football team B seven out of ten times by analysing recent results. Sport is not completely random.
3. Big data can solve 'all' problems.
According to many, big data was going to stop COVID-19 early in the pandemic, but why wasn't it able to stop it from spreading? In the first weeks we saw many predictive models estimating the evolution of the number of infected or dead people, and almost all of them failed miserably. This pandemic is unprecedented, fortunately we do not have centuries of experience, nor data, and it was therefore very risky to make predictions. On the contrary, every year the health authorities predict with great accuracy how many flu infections there will be and in which week the peak will occur. The key is that influenza is recurrent and we have a representative historical data series to rely on.
4. Elections without surprises with big data.
Experience shows that no, we can accurately predict the outcome of the next election by studying what happens on social media. This is a legend based on the successful prediction of the outcome of the US presidential election in 2012, when Barak Obama was re-elected. However, those same predictions failed in the 2016 election, when Donald Trump was the winner against all odds. The reality is that electoral behaviour is extremely difficult to model, and social media are not an accurate representation of society, but only of its most boisterous part.
5. Your future career depends on big data
It seems that, in the future, staff selection processes or dismissals will be decided by algorithms using big data. But while it is true that decision-making using data is a very powerful tool in the hands of companies, the last word will always be left to a human being. Blaming the algorithm is the 2021 version of that excuse from the 1990s, "the lines are too busy" or "the computer is too slow".
6. My data does not matter to anyone, there is no danger in sharing it with apps or platforms.
Any data we generate when we use the internet stays online forever and we don't know who will use it or how it will be commercialised, now or in 20 years' time. It is very important that we educate young people to take care of the information they disseminate.
7. Big Data and algorithms are magic.
This is the most pernicious myth of all. Behind this technology there is only mathematics and computation, nothing irrational, and it is developed by human beings who know what they are doing and do not use incantations or perform occult ceremonies. No superhuman power is needed to understand it, only study and dedication.