Written by

Bernard Marr

Bernard Marr is a world-renowned futurist, influencer and thought leader in the fields of business and technology, with a passion for using technology for the good of humanity. He is a best-selling author of over 20 books, writes a regular column for Forbes and advises and coaches many of the world’s best-known organisations. He has a combined following of 4 million people across his social media channels and newsletters and was ranked by LinkedIn as one of the top 5 business influencers in the world.

Bernard’s latest books are ‘Future Skills’, ‘The Future Internet’, ‘Business Trends in Practice’ and ‘Generative AI in Practice’.

Generative AI Book Launch
View My Latest Books

Follow Me

Bernard Marr ist ein weltbekannter Futurist, Influencer und Vordenker in den Bereichen Wirtschaft und Technologie mit einer Leidenschaft für den Einsatz von Technologie zum Wohle der Menschheit. Er ist Bestsellerautor von 20 Büchern, schreibt eine regelmäßige Kolumne für Forbes und berät und coacht viele der weltweit bekanntesten Organisationen. Er hat über 2 Millionen Social-Media-Follower, 1 Million Newsletter-Abonnenten und wurde von LinkedIn als einer der Top-5-Business-Influencer der Welt und von Xing als Top Mind 2021 ausgezeichnet.

Bernards neueste Bücher sind ‘Künstliche Intelligenz im Unternehmen: Innovative Anwendungen in 50 Erfolgreichen Unternehmen’

View Latest Book

Follow Me

What Is A Data Lake? A Super-Simple Explanation For Anyone

2 July 2021

If you’re even tangentially involved with big data, you know that finding storage solutions for the volumes of data being generated every second is of utmost importance. When it comes to managing data, data professionals can consider using a data warehouse or a data lake as a data repository. In order to determine what’s best for your organisation, let’s first define what they are and then compare them.   

What is a data lake?

Some mistakenly believe that a data lake is just the 2.0 version of a data warehouse. While they are similar, they are different tools that should be used for different purposes.James Dixon, the CTO of Pentaho is credited with naming the concept of a data lake. He uses the following analogy:

“If you think of a datamart as a store of bottled water – cleansed and packaged and structured for easy consumption – the data lake is a large body of water in a more natural state. The contents of the data lake stream in from a source to fill the lake, and various users of the lake can come to examine, dive in, or take samples.”

A data lake holds data in an unstructured way and there is no hierarchy or organisation among the individual pieces of data. It holds data in its rawest form—it’s not processed or analysed. Additionally, a data lakes accepts and retains all data from all data sources, supports all data types and schemas (the way the data is stored in a database) are applied only when the data is ready to be used.  

What is a data warehouse?

A data warehouse stores data in an organised manner with everything archived and ordered in a defined way. When a data warehouse is developed, a significant amount of effort occurs during the initial stages to analyse data sources and understand business processes. Decisions are made regarding what data to include and exclude from the warehouse. Data is only loaded into the warehouse when a use for the data has been identified.         

How do data lakes and data warehouses compare?

Data

Data lakes retain all data—structured, semi-structured and unstructured/raw data. It’s possible that some of the data in a data lake will never be used. Data lakes keep all data as well. A data warehouse only includes data that is processed (structured) and only the data that is necessary to use for reporting or to answer specific business questions.

Agility

Since a data lake lacks structure, it’s relatively easy to make changes to models and queries. Data lakes are more flexible and can be configured and reconfigured as necessary based on the job you need it to do. It’s much more cumbersome and time-consuming to change the structure of a data warehouse due to the number of business processes tied to it.

Users

Data scientists are typically the ones who access the data in data lakes because they have the skill-set to do deep analysis. Technically, data lakes can support all users and are available to all. Data warehouses are used by specific business users to report and extract a particular meaning from the data that was defined when the data warehouse was set up; they are usually too restrictive for data scientists who need to go beyond the boundaries of the warehouse to glean new analysis from the data.

Security

Since data warehouses are more mature than data lakes, the security for data warehouses is also more mature. There is also concern that since all data is stored in one repository in a data lake that it also makes the data more vulnerable. It certainly makes auditing and compliance easier with just one store to manage. Data lakes and data warehouses are different tools for different purposes. If you already have an established data warehouse, you might choose to implement a data lake alongside it to solve for some of the constraints you experience with a data warehouse. To determine whether a data lake or data warehouse is best for your needs, you should start with the goal you are trying to achieve and use the data repository that will help you meet your goal.  

Business Trends In Practice | Bernard Marr
Business Trends In Practice | Bernard Marr

Related Articles

The Future Of Medicine: How AI is Shaping Patient Care And Drug Discovery

One of the most exciting aspects of AI is its implications for healthcare. Today, doctors and other medical professionals routinely augment their human skills and experience with the help of intelligent machines.[...]

Navigating The Future: 10 Global Trends That Will Define 2024

We’re approaching the mid-point of a decade in which we’ve already seen significant global transformation.[...]

Unlocking The Future Of Learning: How XR Tech Transforms Education

In the metaverse era, education as we know it will change. And I’m not just talking about formal education in schools, colleges, and universities – but also workplace learning and lifelong learning.[...]

2024 IoT And Smart Device Trends: What You Need to Know For The Future

By the end of 2024, there are projected to be more than 207 billion devices connected to the worldwide network of tools, toys, devices and appliances that make up the Internet of Things (IoT).[...]

The Evolving Internet: Navigating Risks Amidst Immersion, Decentralization, And Generative AI

The future internet is on the horizon, promising unprecedented engagement and innovation. Yet, as we incorporate immersive tech, decentralized systems, and generative AI, we also invite new complexities.[...]

The 8 Biggest Future Of Work Trends In 2024 Everyone Needs To Be Ready For Now

The world of work is constantly changing. Concepts that our parents or grandparents grew up with, such as the nine-to-five office and the job-for-life, are being consigned to the past.[...]

Sign up to Stay in Touch!

Bernard Marr is a world-renowned futurist, influencer and thought leader in the fields of business and technology, with a passion for using technology for the good of humanity.

He is a best-selling author of over 20 books, writes a regular column for Forbes and advises and coaches many of the world’s best-known organisations.

He has a combined following of 4 million people across his social media channels and newsletters and was ranked by LinkedIn as one of the top 5 business influencers in the world.

Bernard’s latest book is ‘Generative AI in Practice’.

Sign Up Today

Social Media

0
Followers
0
Followers
0
Followers
0
Subscribers
0
Followers
0
Subscribers
0
Yearly Views
0
Readers

Podcasts

View Podcasts