In the information age, data is one of a company’s most valuable assets. Businesses that distinguish themselves in how they work with data are leading the field when it comes to growth and innovation. Data fuels artificial intelligence (AI) and machine learning, robotic automation, the internet of things (IoT), and every other cornerstone of the fourth industrial revolution – a wave of digital transformation that the World Economic Forum predicts will create $3.7 trillion in value by 2025.
Of course, data on its own is not that useful. To make it work, organizations need a data strategy, data skills, and a governance process. Even with all of that in place, though, it won’t get far without infrastructure. Data infrastructure covers the software and hardware tools used to collect, store and process data, as well as the crucial last step of communicating your insights.
Data infrastructure isn’t the first thing you should think about when starting out working with technologies like AI or IoT – you need to fit the tools you use to your strategies, problems, and business questions, rather than the other way around! But sooner or later, you’ll need to start splashing the cash on the devices, applications, platforms, and services that make the magic happen.
As businesses have rushed to embrace the value offered by data and data-driven discovery, a busy marketplace of platform and solution providers, as well as third-party data vendors, has emerged. This has had the effect of lowering the barriers of entry to working with cutting-edge technologies and advanced analytics solutions. Some of these offerings are even referred to as “infrastructure-as-a-service”, with their providers offering to take care of your end-to-end data requirements. Navigating the maze of different products and services on offer in an optimal way can take a great deal of research and preparation. It’s very important to stay firmly focused on your needs – finding answers to your most important business questions – and to not get sidetracked by the lofty promises and flashy buzzwords!
There are four key functions your data infrastructure needs to provide. Many tools and platforms on the market offer all of them under one roof. But many businesses have found they need to mix and match solutions to fill their specific requirements, sometimes combining proprietary and open-source technologies to create bespoke solutions. On the other hand, smaller organizations or those embarking on “quick win” data pilots of limited scope can often find one app, platform, or service that does do it all.
The first is data collection. This is about taking data into your infrastructure stack – whether it's internal data that's simply collated from your sales transactions, customer feedback or HR records, or external data collected from social media, public data sources, or bought-in third-party data. It could be very simple, structured data, or it could be very messy, unstructured – but potentially very valuable – data such as video recordings or conversation logs. One particularly valuable form data can take is real-time streaming data. This is the type of data used, for example, by banks and credit card companies to monitor transactions as-they-happen, using AI algorithms to spot attempted fraudulent activity and stop it in its tracks. It’s also used to identify “micro-moments” – selling opportunities that may last just seconds. These types of data initiatives require robust data collection infrastructure
Next, there is data storage. Depending on the type and sensitivity of the data you're working with, you might want to keep it on-premises in your own data warehouse, or you might want to put it in the cloud. Cloud storage providers make your data accessible to you from anywhere in the world, without you needing to worry about the large up-front expense of setting up your own servers in a physical location, along with all of the logistical, energy, and security efforts that involves. Again, though, for smaller companies starting out with less ambitious projects, the small scale of the requirements might mean that this isn’t an issue. Increasingly, as businesses begin to work with more types of data and initiate multiple data projects, they might look to newer models such as private cloud or hybrid cloud. One important consideration here is to avoid making your data too “siloed” – the aim is to make it available across the business, so new uses can be found for it that may not even have been thought of when the data was collected.
The next key consideration is how you will process and analyze your data. This is the glamourous and exciting stage where you might get to work with technologies like machine learning, computer vision, language processing, or neural nets if you're operating on the cutting-edge. Here we have to find solutions for preparing and cleansing our data, building analytics models, and extracting insights from the raw information. As with storage, this is a service that’s offered by the cloud providers (Google, Amazon, Microsoft, for example) that all offer access to analytical tools as part of their package. Platforms such as Amazon QuickSight, Infobright, IBM Cognos Analytics, Hortonworks Data Platform, Cloudera Data Warehouse, Pivotal Analytics, Sisense, Alteryx, Splunk, and SAP Analytics Cloud all offer AI-as-a-service.
Finally, there’s the critical last step of taking the insights and reporting them to the people (or sometimes machines!) that can use them to generate growth and positive change. This means visualizing the data or creating reports. This communication might be between your data team and your wider workforce when the aim of working with the data is to streamline and create efficiencies in your own internal processes. On the other hand, if you're using data to create smarter products and services, it might be between the business and its customers. More advanced use cases require a hybrid approach to this – for example, IBM partners with the Wimbledon Tennis Championship to create a comprehensive suite of data services. Some are aimed at media and advertisers to help them make marketing decisions, some are used to help players train and improve their game, and some are used to create enjoyable audience experiences for the fans. All of this is derived from the same data collection, storage, and analytics infrastructure, but it’s during this final step where the real value is created for each specific group of data users. Insights are reported through applications and dashboards, tailored to specific audiences, and putting these together is the final step of building a comprehensive data infrastructure.
Building data infrastructure can be as simple or as complex as your specific needs entail – if you simply want to run data-driven marketing initiatives to identify potential new customers, for example, then many everything-in-one-place services can do this for you. Just remember that when tools are easily accessible to anyone, then everyone – including your competitors – can use them, pretty easily. If you’re looking to use data as a way to differentiate yourself in your market (which you certainly should be doing!) then more innovative, ground-breaking solutions might require looking for ways to go a little bit further.
Building a data and analytics infrastructure is one of the topics covered in depth in the second edition of my book ‘Data Strategy: How To Profit From A World Of Big Data, Analytics And Artificial Intelligence.