The Amazing Ways Snowflake Uses Generative AI For Synthetic Data And Natural Language Queries
24 September 2023
You probably know that the new generation of generative AI tools that have exploded onto the scene can generate words, pictures and even videos that closely resemble those created by humans. But did you know that it can also be used to generate data itself?
Modern artificial intelligence (AI) works by recognizing patterns in data and using it to answer questions or predict what comes next. In the case of generative AI like Open AI‘s ChatGPT, it uses it to create more data that follows the rules of the data it’s trained on.
But real data comes with complications – it can be difficult and expensive to collect and brings security and privacy obligations.
Think about a dataset comprising thousands of human faces, for example – as used to train facial recognition algorithms. You have to find and photograph thousands of people and then get their permission to store and use their data. Then, myriad checks and balances must be followed to ensure your data isn't harmfully biased.
One solution is synthetic data. This is data created by machines and closely resembles real-world data that can be used for many of the same purposes.
Snowflake is one of the world's biggest "data-as-a-service" companies that, in addition to their analytics services, also offers a data marketplace covering thousands of topics, including healthcare, finance and retail.
Now, it’s augmenting these offerings with synthetic, AI-generated datasets and putting generative AI to use in several other interesting applications. Let's take a look!
First, What Is Synthetic Data?
Synthetic data is information that has been artificially generated in order to have the same characteristics as a real-world dataset but without including any real-world data.
Generative AI is particularly suited to this task as it can easily analyze any dataset and then create synthetic data that closely matches it. It means businesses can train AI algorithms and perform tests and simulations without exposing private or sensitive information that might be contained in real-world data.
It’s used in finance to train fraud detection algorithms to spot deliberately falsified transactions, in healthcare to avoid using sensitive patient data, and in retail and marketing to create synthetic customers and analyze their buying behavior.
According to Gartner research, business leaders are most likely to turn to synthetic data because of difficulties with accessibility, complexity and availability of real-world data. It also found that partially synthetic datasets – where real-world data is augmented with synthetic data – are more commonly used than fully synthetic datasets.
By generating synthetic data, companies can create any information they need to plug gaps in existing records or create entirely new datasets. It doesn’t negate the need for real-world data, which is needed to create synthetic data in the first place. But when used effectively, it can reduce the cost, speed up the training of machine learning models, and help businesses automate and make better decisions.
Generative Synthetic Data At Snowflake
Snowflake sells data to businesses via its Snowflake marketplace, which is one of the largest B2B data brokerages in the world.
Alongside its thousands of real-world datasets, Snowflake now offers access to synthetic datasets created by generative AI algorithms. One example is San Francisco-based Synthesis AI’s synthetic human face dataset, comprising 5,000 individual images of diverse human faces.
In the past, facial recognition algorithms have been criticized and even banned due to concerns over biases in the datasets used to train them. This has led to differences in their ability to identify people of different ethnic backgrounds and accusations that they could be unfair or prejudiced.
Using synthetic data in this way can help to tackle those problems (note – I will not say it solves them entirely) as datasets can be created in line with whatever level of representation or inclusiveness is needed.
While synthetic data existed before the emergence of generative AI, the new class of generative algorithms means that datasets can quickly be scaled to any size that's needed. Datasets created in this way can also be easily customized to fit the needs of different customers around the world.
It also offers synthetic financial data from Clearbox AI, consisting of simulated mortgage applications designed to mimic both legitimate and fraudulent applications. The data in these sets had been augmented by data created by generative AI.
Snowflake has made it clear that it expects synthetic data generated by AI to play an important role in its business going forward. As generative models such as large language models (LLMs) become more sophisticated, we will see them becoming capable of creating synthetic data that more and more accurately reflects the real world, leading to cheaper and more efficient insights for businesses.
How Else is Generative AI Used at Snowflake?
As well as offering access to AI-generated synthetic data, Snowflake has created a number of tools based on generative AI for its customers to use.
Thanks to its acquisition this year of Neeva – a search startup founded by former employees of Google- it is implementing natural language querying of its datasets. Effectively, this will let users talk to their data, getting insights by asking straightforward questions rather than running traditional data science analysis. CEO Frank Slootman told VentureBeat, "Engaging with data through natural language is becoming popular … this will increase our opportunity to allow non-technical users to extract value from their data.”
It has also launched a partnership with Nvidia, using the chip maker’s NeMo LLM to create a platform that lets Snowflake users build generative AI applications like Chatbots and search engines with the ability to access Snowflake data.
Another LLM initiative is creating its Document AI tool that allows users to query documents – legal contracts or invoices, for example – and extract meaning for them. This was developed with technology that Snowflake acquired when it bought the Swedish natural language platform Applica in 2022.
Altogether, it's clear that Snowflake has big hopes for generative AI to create synthetic data and build tools to help us analyze and extract value from it.
Related Articles
Will AI Solve The World’s Inequality Problem – Or Make It Worse?
We are standing on the cusp of a new technological revolution. AI is increasingly permeating every aspect of our lives, with intelligent machines transforming the way we live and work.[...]
How You Become Irreplaceable In The Age Of AI
In a world where artificial intelligence is rapidly advancing, many of us are left wondering: Will AI take our jobs?[...]
Why Apple Intelligence Sets A New Gold Standard For AI Privacy
In the rapidly evolving world of artificial intelligence, privacy concerns have become a hot-button issue.[...]
Can Your Device Run Apple Intelligence? What You Need To Know
Apple's announcement of Apple Intelligence has sent waves of excitement through the tech world.[...]
10 Amazing Things You Can Do With Apple Intelligence On Your IPhone
Apple Intelligence is poised to revolutionize the iPhone experience, offering a suite of AI-powered tools that promise to make your digital life easier, more productive, and more creative.[...]
Agentic AI: The Next Big Breakthrough That’s Transforming Business And Technology
The world of artificial intelligence is evolving at a breakneck pace, and just when you thought you'd wrapped your head around generative AI, along comes another game-changing concept: agentic AI.[...]
Sign up to Stay in Touch!
Bernard Marr is a world-renowned futurist, influencer and thought leader in the fields of business and technology, with a passion for using technology for the good of humanity.
He is a best-selling author of over 20 books, writes a regular column for Forbes and advises and coaches many of the world’s best-known organisations.
He has a combined following of 4 million people across his social media channels and newsletters and was ranked by LinkedIn as one of the top 5 business influencers in the world.
Bernard’s latest book is ‘Generative AI in Practice’.
Social Media