The emergence of data science as a field of study and practical application over the last century has led to the development of technologies such as deep learning, natural language processing, and computer vision. Broadly speaking, it has enabled the emergence of machine learning (ML) as a way of working towards what we refer to as artificial intelligence (AI), a field of technology that’s rapidly transforming the way we work and live.
Data science encompasses the theoretical and practical application of ideas, including Big Data, predictive analytics, and artificial intelligence. If data is the oil of the information age and ML is the engine, then data science is the digital domain’s equivalent of the laws of physics that cause combustion to occur and pistons to move.
A key point to remember is that as the importance of understanding how to work with data grows, the science behind it is becoming more accessible. Ten years ago, it was considered a niche crossover subject straddling statistics, mathematics and computing, taught at a handful of universities. Today, its importance to the world of business and commerce is well established, and there are many routes, including online courses and on-the-job training, that can equip us to apply these principles. This has led to the much-discussed "democratization" of data science, which we will undoubtedly see impact many of the trends mentioned below, in 2022 and beyond.
Small Data and TinyML
The rapid growth in the amount of digital data that we are generating, collecting, and analyzing is often referred to as Big Data. It isn’t just the data that’s big, though – the ML algorithms we use to process it can be quite big, too. GPT-3, the largest and most complicated system capable of modeling human language, is made up of around 175 billion parameters.
This is fine if you’re working on cloud-based systems with unlimited bandwidth, but that doesn’t by any means cover all of the use cases where ML is capable of adding value. This is why the concept of “small data” has emerged as a paradigm to facilitate fast, cognitive analysis of the most vital data in situations where time, bandwidth, or energy expenditure are of the essence. It’s closely linked to the concept of edge computing. Self-driving cars, for example, cannot rely on being able to send and receive data from a centralized cloud server when trying to avoid a traffic collision in an emergency situation. TinyML refers to machine learning algorithms designed to take up as little space as possible so they can run on low-powered hardware, close to where the action is. In 2022 we will see it appearing in an increasing number of embedded systems – everything from wearables to home appliances, cars, industrial equipment, and agricultural machinery, making them all smarter and more useful.
Data-driven Customer Experience
This is about how businesses take our data and use it to provide us with increasingly worthwhile, valuable, or enjoyable experiences. This could mean cutting down friction and hassle in e-commerce, more user-friendly interfaces and front-ends in the software we use, or spending less time on hold and being transferred between different departments when we make a customer service contact.
Our interactions with businesses are becoming increasingly digital – from AI chatbots to Amazon’s cashier-less convenience stores - meaning that often every aspect of our engagement can be measured and analyzed for insights into how processes can be smoothed out or made more enjoyable. This has also led to a drive to create greater levels of personalization in goods and services being offered to us by businesses. The pandemic sparked a wave of investment and innovation in online retail technology, for example, as businesses looked to replace the hands-on, tactile experiences of bricks ‘n’ mortar shopping trips. Finding new methods and strategies for leveraging this customer data into better customer service and new customer experiences will be a focus for many people working in the field of data science during 2022.
Deepfakes, generative AI, and synthetic data
This year many of us were tricked into believing Tom Cruise had started posting on TikTok when scarily realistic “deepfake” videos went viral. The technology behind this is known as generative AI, as it aims to generate or create something – in this case, Tom Cruise regaling us with tales of meeting Mikhail Gorbachev – that doesn’t exist in reality. Generative AI has quickly become embedded in the arts, and entertainment industry, where we have seen Martin Scorsese de-age Robert DeNiro in The Irishman and (spoiler alert) a young Mark Hamill appear in The Mandalorian.
In 2022 I expect we will see it bursting into many other industries and use cases. For example, it’s considered to have huge potential when it comes to creating synthetic data for the training of other machine learning algorithms. Synthetic faces of people who have never existed can be created to train facial recognition algorithms while avoiding the privacy concerns involved with using real people’s faces. It can be created to train image recognition systems to spot signs of very rare and infrequently photographed cancers in medical images. It can also be used to create language-to-image capabilities, allowing, for example, an architect to produce concept images of a building simply by describing how it will look in words.
AI, the internet of things (IoT), cloud computing, and superfast networks like 5G are the cornerstones of digital transformation, and data is the fuel they all burn to create results. All of these technologies exist separately, but combined; they enable each other to do much more. Artificial intelligence enables IoT devices to act smart, interacting with each other with as little need for human interference as possible – driving a wave of automation and the creation of smart homes and smart factories, all the way up to smart cities. 5G and other ultra-fast networks don't just allow data to be transmitted at higher speeds; they will enable new types of data transfer to become commonplace (just as superfast broadband and 3G made mobile video streaming an everyday reality) and AI algorithms created by data scientists play a key role in this, from routing traffic to ensure optimal transfer speeds to automating environmental controls in cloud data centers. In 2022, an increasing amount of exciting data science work will take place at the intersection of these transformative technologies, ensuring they augment each other and play nicely together.
Short for "automated machine learning," AutoML is an exciting trend that's driving the "democratization" of data science mentioned in the introduction to this piece. Developers of autoML solutions aim to create tools and platforms that can be used by anyone to create their own ML apps. In particular, it’s aimed at subject matter experts whose specialized expertise and insights make them ideally placed to develop solutions to the most pressing problems in their particular fields but who often lack the coding knowledge needed to apply AI to those problems.
Quite often, a large portion of a data scientist's time will be taken up with data cleansing and preparation – tasks that require data skills and are often repetitive and mundane. AutoML at its most basic involves automating those tasks, but it increasingly also means building models and creating algorithms and neural networks. The aim is that very soon, anyone with a problem they need to solve, or an idea they want to test, will be able to apply machine learning through simple, user-friendly interfaces that keep the inner workings of ML out of sight, leaving them free to concentrate on their solutions. 2022 is likely to see us take a big step closer to this being an everyday reality.