Big Data and Machine Learning are two exciting applications of technology that are often mentioned together in the space of the same breath. In reality, there are important distinctions that need to be understood when we are making decisions about our business data strategy.
Both terms refer to fields of academic study as well as practical business applications that are rooted in data science. This is the branch of science concerned with information and how we can use it to achieve goals.
Today, data is often described as the fuel (or oil) of the information age. It’s what powers our ability to build tools and platforms that can change the world through analytics and increasingly accurate modeling and forecasting.
For an easy example, look at the speed at which pharmaceutical companies were able to create entirely new vaccines against coronavirus. At the start of the pandemic, we regularly heard that it was usual for it to take up to 10 years to develop a new vaccine. The rapidly accelerated pace at which it was done during 2020 was largely due to the way in which our ability to collect and process data has advanced in the last decade. If that particular pandemic had broken out in 2010, when techniques such as deep learning (an advanced application of machine learning) were just nascent ideas locked away in research labs, it would have taken far longer to crack the problem!
Was it Big Data that made it possible, or Machine Learning? In truth, it was a bit of both – because although they are distinct ideas, neither can really be as effective as they are without the other.
Let’s start by defining what each term refers to, then move on to looking at how you can make a decision about which one will work best for you.
What is Big Data and Machine Learning?
Big Data is something of a catch-all term that refers to the vast increase in information that's being created and pumped into the world, as well as the tools, techniques, and methodologies that have been developed to make use of it (which includes machine learning). Big Data was first identified as a powerful force for change around the time the internet started to become a tool for everyday life, rather than a niche project largely confined to government, academia, and the military. A key concept to understand in order to “get” what is meant when we talk about Big Data is that it’s about far more than the size of the data. An early attempt at defining it suggested that there were three “V”s that have to be present for a data project to be considered Big Data – volume (size), variety (the data will be of different types), and velocity (the dataset is quickly growing or changing). Other important concepts to understand include the difference between structured data (information such as numbers that fits nicely into database tables and structures) and unstructured data (information like pictures, video, and speech, that doesn’t).
Machine Learning, on the other hand, refers to a type of computer algorithm that can be thought of as a subset of another loosely defined term – artificial intelligence (AI). The ability to learn is something that we consider to be a fundamental aspect of "intelligence." There are other aspects to intelligence, of course, such as creative intelligence and emotional intelligence, but machine learning is specifically concerned with creating programs that can get better at performing a task as they are fed increasing amounts of information.
Here, an important concept to understand is the difference between supervised and unsupervised learning. Supervised learning involves training algorithms with labeled data, so they can immediately “know” whether they carried out a particular operation correctly. Unsupervised learning involves data that is not labeled, and as such, the algorithm never specifically learns if its operations are resolved correctly or incorrectly – all decisions are made based on what the algorithm can determine from the data itself, and its relationship to other pieces of data the algorithm has been fed.
So, which one is right for me?
The truth is probably that you will get the best results by understanding and choosing the most relevant processes and practices from both disciplines. It’s perfectly possible to use Big Data techniques and tools to extract insights and meaning from information and then use it to drive business growth and improve decision-making without using anything that would correctly be classified as Machine Learning or AI. On the other hand, if you’re using machine learning methods, it’s most likely that your work will tick many of the boxes that qualify it as Big Data – most likely, you will be working with datasets that have volume, velocity, and variety. This is because Machine Learning algorithms need to be trained on data, and in order to become efficient, they need access to a lot of it!
Another way to think of it is that if you’re not working with Big Data, it’s unlikely that you’ll need to use Machine Learning. The main benefit of Machine Learning is that it helps to extract value from datasets that are too complicated for “traditional” computer and statistical analysis. If your dataset is static, structured, and of a manageable size (such as fitting comfortably into an Excel sheet), then Machine Learning – which often requires a large amount of compute power – might be overkill.
Machine Learning is most appropriate when your data is unstructured – unlabeled text, image, or sound data that you’re never going to make sense of using tools like spreadsheets or relational database systems. This is because Machine Learning algorithms can be used to label unstructured data by applying what it “knows” from other, similar data objects that it’s been trained on. Essentially this transforms unstructured data into structured data, allowing it to be operated on by standard computational methods.
Ultimately, Big Data and Machine Learning are two highly interdependent fields, but it's important to remember that, by default, Big Data doesn't necessarily mean "smart" – unlike Machine Learning, it doesn’t necessarily “learn” anything, and the same algorithm will give you the same result again and again, no matter how many times you run it.
However, Machine Learning needs Big Data in order to work – without it, it’s never going to “learn” anything!
A final concept to cover here that can help make a decision on where you should be focusing your efforts is automation. This means creating processes that carry out tasks automatically, with no (or minimal) need for human input. Setting up an out-of-office auto-reply email is an example of automation that doesn't need any form of Machine Learning or AI – you simply tell the computer that any incoming email should trigger a response.
However, if you want to set up more complex automations – such as varying the reply depending on the content of the email, you might want to look into Machine Learning. Using it, it would be quite possible to create a program that will scan the contents of the email (unstructured data) and then send an appropriate response depending on the urgency (or other factors) of the communication.
You don’t always need both – but Big Data together with Machine Learning makes a very powerful combination.