Everyday Implications of the Big Data Flood, Part 1
As you are probably aware, the total amount of data in the world is exploding.
At Lingaro we help multinational companies capitalize on this phenomenon, and we are watching it play out from first row seats.
But what implications does this “Big Data flood” have at the individual level? For the average person who probably isn’t in the “data industry” ?
I don’t think anyone can say for sure. There are some approximate historical parallels, but for all practical purposes this is the first time we’ve been here as a society.
What I can say is that everyone should be paying close attention to three key issues in the “Big Data floodplain” – artificial intelligence, automation, and privacy – that are already making everyday life look significantly different than a few years ago.
I’ll describe how and why that is for each issue in more detail in subsequent posts. Here, I’ll start with some background.
The comparison between data and oil is strong – on several levels.
It’s not 100% sure but most probably it was Clive Humby who used the phrase “data is the new oil” for the first time, back in 2006. The catchy phrase became quite popular, partly because the metaphor behind it gives quite a lot of room for interpretation.
Consider, for example, how an entire industry has emerged around the collection, storage, and processing of data – just as how a similar business ecosystem exists around oil. The “data industry” includes the world’s five biggest companies by market capitalization: Apple, Alphabet (Google’s parent company), Microsoft, Facebook, and Amazon.
It might be a stretch to say that “Apple is worth more than Poland,” but the reality is that no country would realistically be able to buy any of these five companies even it was technically feasible to do so.
That fact alone should indicate how data is significantly shaping how the world works in terms of not only the global economy but also the daily life of millions of people.
The rise of the data industry and its effects therefore provokes a lot of questions such as how to define a monopoly, how to measure the power and influence of global companies, how to ensure equilibrium between them and national authorities, and how to protect consumers against abuse. Data-related products and services are often technically complex to the point that nobody can predict what risks and problems they may bring.
Data is “platform fuel” after being refined by data science.
The analogy to oil doesn’t end with the fact that data is a resource that shapes the world and creates many potential problems.
Similar to oil, data has to be processed and refined in order to be valuable. The majority of the data that is collected and used today has limited structuring and is full of garbage and errors.
Long story short, data by itself is basically useless in raw form. To “distill” the information from it we need to apply some “chemistry” and “materials engineering.” Enter the mysterious term data science.
Just as fuel powers an engine, data powers a big data platform, a cluster of multiple computers working together.
The terms data science and big data platform are somewhat blurry but it doesn’t make much sense to argue about definitions. The thing to know is that an enormous amount of low-hassle processing power is available to companies even of moderate size, mostly thanks to large public clouds like Amazon Web Services, Microsoft Azure, and Google Cloud Platform. The amount of data available for analysis is unprecedented as well.
This combination of data and processing power delivers almost endless possibilities, some of the most impactful of which involve artificial intelligence.
Artificial intelligence is a rough form of intelligence inspired by the human version.
Today’s systems falling under the umbrella term artificial intelligence (AI) are oftentimes based on how the human brain functions and can solve some surprisingly complex problems. Even so, they are pretty far from actually being intelligent in the human sense.
They learn in a narrow, particular way that essentially involves an algorithm functioning in a way not strictly predetermined by the programmer. The programmer specifies a function called a model which has some changeable moving parts, i.e. parameters. Based on historical data the model conforms to the expected result by minimizing a predefined error. One system solves one problem, adjusting to the statistical relationships that can be found in the training data.
Such “primitive” methods – which get pretty complex especially when assembling multiple models for one task – are enough for machines to outperform humans and step into their shoes to complete a growing range of tasks even in fields we previously thought were reserved for people, such as art, music, and object recognition from photographs. These advances are both fascinating and terrifying, as they suggests a future with mass unemployment, especially taking into account how dynamically AI is developing.
To be fair, we are still far from creating what is known as Strong AI or Artificial General Intelligence (AGI), a system that has the intellectual and cognitive abilities of a human in all respects. Such a system would be able to correct and enhance itself to create even better AI, which would repeat the process to start a technical development snowball impossible to control or predict.
This phenomenon is known as technical singularity and is popular in science fiction. Achieving singularity – or something that would lead to it – has been predicted a couple of times already. In 1965, Herbert A. Simon, an AI pioneer, supposedly wrote that “Machines will be capable, within twenty years, of doing any work a man can do.” Today’s specialists in the field do not agree if AGI is actually possible, when it could happen, or even on criteria to define Strong AI.
More often than not, you can encounter anxiety or even alarm regarding AI’s getting out of control – even without its being self-aware or completely self-reliant – thanks to design flaws caused by hackers or just simple human error.
I’ll cover artificial intelligence in more depth in the next post. Stay tuned!
Dominik is a versatile IT consultant and developer working in Lingaro's Big Data and IoT section. He has experience with various data-processing systems, including RDBMSs, Hadoop platforms and the ELK stack. An avid news reader, he enjoys identifying technology trends and sharing his findings.