Written by

Bernard Marr

Bernard Marr is a world-renowned futurist, influencer and thought leader in the fields of business and technology, with a passion for using technology for the good of humanity. He is a best-selling author of 20 books, writes a regular column for Forbes and advises and coaches many of the world’s best-known organisations. He has over 2 million social media followers, 1 million newsletter subscribers and was ranked by LinkedIn as one of the top 5 business influencers in the world and the No 1 influencer in the UK.

Bernard’s latest book is ‘Business Trends in Practice: The 25+ Trends That Are Redefining Organisations’

View Latest Book

Follow Me

Bernard Marr ist ein weltbekannter Futurist, Influencer und Vordenker in den Bereichen Wirtschaft und Technologie mit einer Leidenschaft für den Einsatz von Technologie zum Wohle der Menschheit. Er ist Bestsellerautor von 20 Büchern, schreibt eine regelmäßige Kolumne für Forbes und berät und coacht viele der weltweit bekanntesten Organisationen. Er hat über 2 Millionen Social-Media-Follower, 1 Million Newsletter-Abonnenten und wurde von LinkedIn als einer der Top-5-Business-Influencer der Welt und von Xing als Top Mind 2021 ausgezeichnet.

Bernards neueste Bücher sind ‘Künstliche Intelligenz im Unternehmen: Innovative Anwendungen in 50 Erfolgreichen Unternehmen’

View Latest Book

Follow Me

Does Synthetic Data Hold The Secret To Artificial Intelligence?

2 July 2021

Could synthetic data be the solution to rapidly train artificial intelligence (AI) algorithms? There are advantages and disadvantages to synthetic data; however, many technology experts believe that synthetic data is the key to democratising machine learning and to accelerate testing and adoption of artificial intelligence algorithms into our daily lives. 

What is synthetic data?

When a computer artificially manufactures data rather than measures and collects it from real-world situations it’s called synthetic data. The data is anonymized and created based on the user-specified parameters so that it’s as close as possible to the properties of data from real-world scenarios.

One way to create synthetic data is to use real-world data but strip the identifying aspects such as names, emails, social security numbers and addresses from the data set so that it is anonymized. A generative model, one that can learn from real data, can also create a data set that closely resembles the properties of authentic data. As technology gets better, the gap between synthetic data and real data diminishes.

Synthetic data is useful in many situations. Similar to how a research scientist might use synthetic material to complete experiments at low risk, data scientists can leverage synthetic data to minimise time, cost and risk. In some cases, there isn’t a large enough data set available to train a machine learning algorithm effectively for every possible scenario so creating a data set can ensure comprehensive training. In other cases, real-world data cannot be used for testing, training or quality-assurance purposes due to privacy concerns, because the data is sensitive or it is for a highly regulated industry.

Advantages of synthetic data

Huge data sets are what powers deep learning machines and artificial intelligence algorithms that are expected to help solve very challenging issues. Companies such as Google, Facebook and Amazon have had a competitive advantage due to the amount of data they create daily as part of their business. Synthetic data allows organisations of every size and resource levels the possibility to also capitalise on learning that is powered by deep data sets which ultimately can democratise machine learning. 

Creating synthetic data is more efficient and cost-effective than collecting real-world data in many cases. It can also be created on demand based on specifications rather than needing to wait to collect data once it occurs in reality. Synthetic data can also complement real-world data so that testing can occur for every imaginable variable even there isn’t a good example in the real data set. This allows organisations to accelerate the testing of system performance and training of new systems.

The limitations for using real data for learning and testing are reduced when using fabricated data sets. Recent research suggests that it is possible to get the same results using synthetic data as you would with authentic data sets.

Disadvantages of synthetic data

It can be challenging to create high-quality synthetic data especially if the system is complex. It’s important that the generative model creating the synthetic data is excellent or the data it generates will be affected. If synthetic data isn’t nearly identical to a real-world data set, it can compromise the quality of decision-making that is being done based on the data.

Even if synthetic data is really good, it is still a replica of specific properties of a real data set. A model looks for trends to replicate, so some of the random behaviours might be missed.     

Applications of synthetic data

Whenever privacy concerns are an issue such as in the financial and healthcare industries or an enormous data set is required to train machine learning algorithms, synthetic data sets can propel progress. Here are just a few applications of synthetic data:

  • Synthetic data with record-level data can be used from healthcare organisations to inform care protocols while protecting patient confidentiality. Simulated X-rays are combined with actual X-rays to train AI algorithms to identify conditions.
  • Fraudulent activity detection systems can be tested and trained without exposing personal financial records.
  • DevOps teams use synthetic data to test software and ensure quality.
  • Machine learning algorithms are often trained with synthetic data.
  • Waymo tested its autonomous vehicles by driving 8 million miles on real roads plus another 5 billion on simulated roadways. Other automakers are using video games such as Grand Theft Auto to aid its self-driving technology.

While synthetic data isn’t fool proof, it is an important tool to augment machine learning algorithms when real data is too expensive to collect, inaccessible due to privacy concerns or incomplete.   

Business Trends In Practice | Bernard Marr
Business Trends In Practice | Bernard Marr

Related Articles

The Benefits And Dangers Of Using AI In Recruitment

Many of us don’t like the idea of putting decisions that can seriously impact people’s lives in the hands of machines.[...]

What You Need To Know Before You Start Working With Artificial Intelligence

It seems like everyone is talking about artificial intelligence at the moment, and there’s good reason for that. We are seeing its revolutionary impact across just about every industry.[...]

Radically Human: How AI-Powered And New Technologies Are Shaping Our Future

There are a lot of great books being published these days on the subject of artificial intelligence, as human authors attempt to tackle the technical, philosophical, and societal challenges posed by our growing reliance on smart machines.[...]

What Is AI Imitation Learning – A Super-Simple Guide Anyone Can Understand

Humans and animals’ most successful learning strategy is to observe actions being carried out, and to attempt to replicate them.[...]

Future Proof Your Business With AI In Products And Services

There’s really no escaping artificial intelligence (AI) and the Internet of Things (IoT). It seems pretty much anything can be made smart these days – and that goes for services as well as products.[...]

The Five Biggest Marketing Tech Trends In 2022

In 2022, the ongoing global pandemic has continued to accelerate the uptake of digital and cloud technology in every business function, with marketing certainly being no exception.[...]

Stay up-to-date

  • Get updates straight to your inbox
  • Join my 1 million newsletter subscribers
  • Never miss any new content

Social Media

0
Followers
0
Likes
0
Followers
0
Subscribers
0
Followers
0
Subscribers
0
Yearly Views
0
Readers

Podcasts

View Podcasts