Written by

Bernard Marr

Bernard Marr is a world-renowned futurist, influencer and thought leader in the fields of business and technology, with a passion for using technology for the good of humanity. He is a best-selling author of over 20 books, writes a regular column for Forbes and advises and coaches many of the world’s best-known organisations. He has a combined following of 4 million people across his social media channels and newsletters and was ranked by LinkedIn as one of the top 5 business influencers in the world.

Bernard’s latest books are ‘Future Skills’, ‘The Future Internet’, ‘Business Trends in Practice’ and ‘Generative AI in Practice’.

Generative AI Book Launch
View My Latest Books

Follow Me

Bernard Marr ist ein weltbekannter Futurist, Influencer und Vordenker in den Bereichen Wirtschaft und Technologie mit einer Leidenschaft für den Einsatz von Technologie zum Wohle der Menschheit. Er ist Bestsellerautor von 20 Büchern, schreibt eine regelmäßige Kolumne für Forbes und berät und coacht viele der weltweit bekanntesten Organisationen. Er hat über 2 Millionen Social-Media-Follower, 1 Million Newsletter-Abonnenten und wurde von LinkedIn als einer der Top-5-Business-Influencer der Welt und von Xing als Top Mind 2021 ausgezeichnet.

Bernards neueste Bücher sind ‘Künstliche Intelligenz im Unternehmen: Innovative Anwendungen in 50 Erfolgreichen Unternehmen’

View Latest Book

Follow Me

Data Preparation – What Is It And Why Is It So Important?

2 July 2021

When you’re cooking, preparation is an essential step. Ingredients need to be collected, peeled, marinated and put where you will be able to reach them when the oil is hot or the oven reaches the right temperature.

This is also true in any business analytics and data-driven process – today data comes from increasingly disparate sources and in an ever-growing variety of forms. The insights you are looking for could lie in Images, communications, machine-to-machine interactions and real-time sensor data. Most probably, they will lie in a combination of more than one of these sources. This means, in order to get them to work together, they need the same care and attention as garden-fresh vegetables and prime cuts of meat do, before you throw them in the pot.

As every project is likely to be different and involve different data, there are no hard and fast checklists for each step you need to take to ensure your data is sufficiently prepped. However, commonly, any operation that takes place on data between it being ingested into your system, and processed through a particular analytical system, can be considered as the data prep stage for that process.

All of them share a common aim – to ensure your analytics processes receive error-free data in a consistent format which they will be able to read, understand and work with. Frequently, this will include:

Data prep strategy

As all projects are different the first step is always to start with strategy. In terms of data preparation this means formulating a workflow process which will cover all of the steps your project needs, and how this will be applied to every different type, or source, of data. To follow my cooking analogy, this would be the equivalent of making sure you have all of the ingredients listed in the recipe, and know what you need to do to them before they go in the pot.

Data cleansing

This means removing data which is inaccurate, damaged or corrupt, or erroneous in some other way which means it is undesirable for it to be taken into account during analytics. This process should pick up errors ranging from mistakes made during human data input, to corrupt data caused by faulty sensors, data transfer systems or storage.

Metadata creation

Labelling of data to make it easier for your analytics systems to know what to do with it, when they receive it at the end of your data prep process. Metadata tags data with “information about information” – for example, when and where a picture or video was taken, or the age and rough geographical location of the sender of a customer complaint email.

Data transformation

This step involves putting data into the correct format for your analytics systems to work with. This means taking the data in whatever format it has been ingested – by scanners, sensors, cameras or manual human data input, and putting it into whichever database format your analytics engines will understand. Data can be compacted to save space and improve speeds at this point, and any elements which will not be read by your analytics processes can be discarded.

Data standardisation

Analytics algorithms and software could expect dates, names, geographical location and a myriad of other features to be presented in a uniform way – checking that all dates are in an eight-digit format rather than six digits, so as to avoid confusion during analysis, comes under this heading. Data can be checked at this point to make sure it falls into appropriate ranges – for example if you are only looking at customers in a certain area, do all of the zip codes meet your requirements?

Data augmentation

Is there anything else which can be added to your data – perhaps from publicly available datasets – which can make it more likely to reveal insights during analysis? Or, it may be possible to extrapolate additional facts based on what is already known – carrying this out ahead of the target analytics will save processing time and ensure you have the highest quality data before your algorithms go to work.

This sounds a bit long-winded – Do I have to do it all myself?

Fortunately, not! Firstly, if your data initiative only involves one type or source of data – just video, or just the names and addresses of customers, or just transactional records, chances are that there are tools already out there which will handle your data just fine in its raw state.

However, with most Big Data projects, the volume, variety and velocity of the data involved is too great for it to be practical to carry out these tasks manually. In these cases, thankfully, a large and growing “self-service” market for data preparation tools has emerged.

Due to the uniform nature of the operations and the repetitive tasks involved, data preparation is an ideal candidate for process automation, and “one-stop-shop” solutions, often delivered through simple web interfaces, requiring a minimum of data science training, are becoming increasingly common.

Just as with cooking, when it comes to business intelligence and data, good solid preparation can often be the difference between success and failure. Whichever of these processes you decide are necessary in your situation, a consistent data prep strategy should be a priority for anyone involved in digital transformation and data-driven discovery.

Where to go from here

If you would like to know more about data analysis and analytics, cheque out my articles on:

Or browse the Big Data and Analytics section of this site to find more articles and many practical examples.

Business Trends In Practice | Bernard Marr
Business Trends In Practice | Bernard Marr

Related Articles

The New HR Playbook: Catalyze Innovation With Analytics And AI

Beneath the surface of every HR function, there lies a treasure trove of data. But if that[...]

The Eight Biggest HR Trends In 2024

For those working in employee and people management, the focus in 2024 will be on managing[...]

The New Frontier In Workplace Safety: Data Analytics And AI

Almost all employers want to ensure their workplaces are safe zones that are free[...]

The Biggest Banking And Financial Services Trends For 2024

2024 promises to be a landmark year in banking and finance, marked by significant[...]

The Evolution Of Data-Driven And AI-Enabled HR

The pulse of any organization lies not just in its products or services but in its people.[...]

How Data And AI Are Reshaping Contemporary HR Practices

The world of human resources (HR) stands on the precipice of an exciting era powered by data and AI.[...]

Sign up to Stay in Touch!

Bernard Marr is a world-renowned futurist, influencer and thought leader in the fields of business and technology, with a passion for using technology for the good of humanity.

He is a best-selling author of over 20 books, writes a regular column for Forbes and advises and coaches many of the world’s best-known organisations.

He has a combined following of 4 million people across his social media channels and newsletters and was ranked by LinkedIn as one of the top 5 business influencers in the world.

Bernard’s latest book is ‘Generative AI in Practice’.

Sign Up Today

Social Media

0
Followers
0
Followers
0
Followers
0
Subscribers
0
Followers
0
Subscribers
0
Yearly Views
0
Readers

Podcasts

View Podcasts