Issues such as data quality and the effectiveness of algorithms add to the complexity of the problem, but one factor that's easily overlooked is storage. However, far from simply being "somewhere to dump the data," modern storage infrastructure is a critical element of the analytics stack.
Making the right storage decisions is essential when it comes to speedy delivery of insights and aiding decision-making. In a recent discussion I had with Shawn Rosemarin, VP, R&D - Customer Engineering at Pure Storage we delved into some of the challenges businesses face in this field, taking into account of course, the implications it has for two of hottest topics in technology - artificial intelligence (AI) and analytics.
The Role of Data Storage
Today, businesses store data for very different reasons than they did just a few decades in the past. Back when the digital revolution was getting started, information would usually be kept for compliance or governance reasons or simply for the purpose of tracking past performance.
Rosemarin tells me, "It's only in the last couple of decades that we started to say … what could we glean from our historical data … what could we glean from what happened in the past to actually help us try to predict what might happen in the future?”
In recent years, this has led to huge advances in the ability to leverage data for business decision-making. This has evolved simultaneously with the explosion in the volume of data generated by organizations, as well as the technologies available to us to capture and analyze that data.
Often, though, the question of how and where to store information has been treated as an afterthought. I find that people are often surprised when they learn that a huge amount of the world’s data is stored on creaky mechanical disks or even creakier tape storage. Data stored in this way is often difficult to access both in terms of financial cost and, importantly, energy usage.
“When we look at an all-f lash data center, and we look at the benefits of f lash versus disk, and we look at the current environment we’re in – the more I can free up energy … electricity consumption … human overheads and management – the more I can focus that energy savings, efficiency and humans on what I’m actually trying to do – which is to solve AI and analytics challenges,” says Rosemarin.
In the domain of drug discovery, for example, time-to-insight is critical, not just for business reasons but because it can make a difference to human health and the fight against pandemics.
One of Pure’s customers is McArthur Lab, which processes millions of data points every day as it searches for solutions to the growing threat of antimicrobial resistance. This involves tracking genes and mutations that can lead to infections becoming drug-resistant. Moving its storage infrastructure onto Pure Storage technology has led to a 300x increase in some analytical processes, enabling researchers to speed up the identification of "superbugs" and assessment of potential cures.
During our talk, Rosemarin also highlighted work his organization has done with Chungbuk Technopark, a South Korean innovation center specializing in incubating deep and machine learning solutions with local companies.
When it realized that it needed AI-optimized solutions to optimize and reduce energy consumption in its own data storage infrastructure, it migrated its operations to Pure Storage infrastructure. It directly attributes this to a two-fold increase in speed when it comes to processing its stored data for AI workflows.
We also discussed the challenge of data quality – the biggest issue facing businesses that are trying to make the transition to being data-driven. When it comes to time to insight, This is often one of the major obstacles.
“Is that a date? Is that a time? Is that a real English word? – We've got past that,” he contends, with the real difficulty being determining “Did that actually happen?”
As an example, he thinks of a doctor taking notes during a patient consultation.
“The doctor's very focused on the patient – they're focused on delivering healthcare, and so the … notes might not exactly be what happened … the patient might not be telling the doctor exactly what happened … how many drinks did you have this week? How many times did you go to the gym?”
The result of this, he believes, was that when companies raced to put all of their information into data lakes, they often ended up with "data swamps."
If the data doesn’t match reality, the effectiveness of models becomes compromised. Developing ways to assess and mitigate gaps in data quality should be high on the list of priorities for companies making this transformation.
There are many ways that storage infrastructure often impacts data quality. There’s accessibility, meaning data can be more easily validated and corrected. It also impacts governance, where built-in, AI-driven tools can ensure information is stored correctly according to legislation and all mandated security checks and measures are in place.
Further problems emerge when data storage infrastructure is not scalable. This too easily leads to silos and barriers between access to different pools of data within organizations.
I asked Rosemarin for tips for enterprises when it comes to ensuring storage infrastructure is up to the task of running today’s high powered AI-driven analytics initiatives. One piece of advice was, “Embrace simplicity and eliminate complexity.”
“Whether you’re looking at energy consumption or whether you’re looking at human overheads, the fact is you are going to need more energy and more humans to deliver these projects … and infrastructure – specifically storage – is a huge opportunity … of eliminating not just the human overhead but a lot of the energy inefficiency of traditional, legacy storage systems.”
When it comes to data management (with storage being no exception), simplification is almost always the right direction to be heading in. The ability to benefit from data-driven insights is quickly being democratized thanks to the emergence of AI platforms, large language models and generative AI.
In order to fully benefit from this, organizations need to ensure that the accessibility of data is democratized, too.
“Quick win” analytics initiatives are faster to spin up and assess when storage infrastructure is simplified. And time and money saved by moving away from complex storage solutions can be redirected into analytics and AI.
Overall, adopting a strategy of simplification around data storage infrastructure can be an effective method of reducing delays and bottlenecks that can slow time-to-insight.
Facing The Future
Flash storage is now the norm everywhere, it seems, apart from the data center. It’s what’s used in our phones, computers, appliances and even cars.
Wrapping up our chat, Rosemarin told me, "The only place where spinning disks still exist is a data center because of the amount of work and effort it is to move those to f lash.”
It’s clear that he sees this as a challenge that Pure can help its customers to overcome. And it needs to be done, not just for their own future but for the future of the planet.
“The planet will run out of power on the trajectory we’re on,” he tells me.
"Nuclear fission might help us if we get there. But we're already seeing countries tell public cloud players that they can’t enter – there’s just no available power for them to build a data center. Parts of London, Ireland, even the US – Virginia – have actually said ‘no more data centers can be built.’”
New, faster and more efficient forms of storage, including all-flash, will continue to play a part in easing businesses along their journey of digital transformation.
At the same time, they might help contribute to reduced energy use and, therefore, minimizing the organization’s environmental footprint. Whichever way you look at it, this makes thinking more seriously about storage infrastructure a worthwhile endeavor.