Written by

Bernard Marr

Bernard Marr is a world-renowned futurist, influencer and thought leader in the fields of business and technology, with a passion for using technology for the good of humanity. He is a best-selling author of 20 books, writes a regular column for Forbes and advises and coaches many of the world’s best-known organisations. He has over 2 million social media followers, 1 million newsletter subscribers and was ranked by LinkedIn as one of the top 5 business influencers in the world and the No 1 influencer in the UK.

Bernard’s latest book is ‘Business Trends in Practice: The 25+ Trends That Are Redefining Organisations’

View Latest Book

Follow Me

Bernard Marr ist ein weltbekannter Futurist, Influencer und Vordenker in den Bereichen Wirtschaft und Technologie mit einer Leidenschaft für den Einsatz von Technologie zum Wohle der Menschheit. Er ist Bestsellerautor von 20 Büchern, schreibt eine regelmäßige Kolumne für Forbes und berät und coacht viele der weltweit bekanntesten Organisationen. Er hat über 2 Millionen Social-Media-Follower, 1 Million Newsletter-Abonnenten und wurde von LinkedIn als einer der Top-5-Business-Influencer der Welt und von Xing als Top Mind 2021 ausgezeichnet.

Bernards neueste Bücher sind ‘Künstliche Intelligenz im Unternehmen: Innovative Anwendungen in 50 Erfolgreichen Unternehmen’

View Latest Book

Follow Me

What is Hadoop?

2 July 2021

When you learn about Big Data you will sooner or later come across this odd sounding word: Hadoop – but what exactly is it?

Put simply, Hadoop can be thought of as a set of open source programs and procedures (meaning essentially they are free for anyone to use or modify, with a few exceptions) which anyone can use as the “backbone” of their big data operations.

I’ll try to keep things simple as I know a lot of people reading this aren’t software engineers, so I hope I don’t over-simplify anything – think of this as a brief guide for someone who wants to know a bit more about the nuts and bolts that make big data analysis possible.

The 4 Modules of Hadoop

Hadoop is made up of “modules”, each of which carries out a particular task essential for a computer system designed for big data analytics.

1. Distributed File-System

The most important two are the Distributed File System, which allows data to be stored in an easily accessible format, across a large number of linked storage devices, and the MapReduce – which provides the basic tools for poking around in the data.

(A “file system” is the method used by a computer to store data, so it can be found and used. Normally this is determined by the computer’s operating system, however a Hadoop system uses its own file system which sits “above” the file system of the host computer – meaning it can be accessed using any computer running any supported OS).

2. MapReduce

MapReduce is named after the two basic operations this module carries out – reading data from the database, putting it into a format suitable for analysis (map), and performing mathematical operations i.e counting the number of males aged 30+ in a customer database (reduce).

3. Hadoop Common

The other module is Hadoop Common, which provides the tools (in Java) needed for the user’s computer systems (Windows, Unix or whatever) to read data stored under the Hadoop file system.


The final module is YARN, which manages resources of the systems storing the data and running the analysis.

Various other procedures, libraries or features have come to be considered part of the Hadoop “framework” over recent years, but Hadoop Distributed File System, Hadoop MapReduce, Hadoop Common and Hadoop YARN are the principle four.

How Hadoop Came About

Development of Hadoop began when forward-thinking software engineers realised that it was quickly becoming useful for anybody to be able to store and analyze datasets far larger than can practically be stored and accessed on one physical storage device (such as a hard disk).

This is partly because as physical storage devices become bigger it takes longer for the component that reads the data from the disk (which in a hard disk, would be the “head”) to move to a specified segment. Instead, many smaller devices working in parallel are more efficient than one large one.

It was released in 2005 by the Apache Software Foundation, a non-profit organization which produces open source software which powers much of the Internet behind the scenes. And if you’re wondering where the odd name came from, it was the name given to a toy elephant belonging to the son of one of the original creators!

The Usage of Hadoop

The flexible nature of a Hadoop system means companies can add to or modify their data system as their needs change, using cheap and readily-available parts from any IT vendor.

Today, it is the most widely used system for providing data storage and processing across “commodity” hardware – relatively inexpensive, off-the-shelf systems linked together, as opposed to expensive, bespoke systems custom-made for the job in hand. In fact it is claimed that more than half of the companies in the Fortune 500 make use of it.

Just about all of the big online names use it, and as anyone is free to alter it for their own purposes, modifications made to the software by expert engineers at, for example, Amazon and Google, are fed back to the development community, where they are often used to improve the “official” product. This form of collaborative development between volunteer and commercial users is a key feature of open source software.

In its “raw” state – using the basic modules supplied here https://hadoop.apache.org/ by Apache, it can be very complex, even for IT professionals – which is why various commercial versions have been developed such as Cloudera which simplify the task of installing and running a Hadoop system, as well as offering training and support services.

So that, in a (fairly large) nutshell, is Hadoop. Thanks to the flexible nature of the system, companies can expand and adjust their data analysis operations as their business expands. And the support and enthusiasm of the open source community behind it has led to great strides towards making big data analysis more accessible for everyone.

Where to go from here

If you would like to know more about Hadoop or other big data tools, check out my articles on:

Or browse the Big Data section of this site to find more articles and many practical examples.

Business Trends In Practice | Bernard Marr
Business Trends In Practice | Bernard Marr

Related Articles

6 Roadblocks Stopping Web3 And The Metaverse Becoming A Reality

With the emergence of the metaverse and web3 technologies, it’s clear that the next evolution of the internet is already underway.[...]

The Future Of Factories: 3 Ways To Navigate The Industrial Metaverse

What is the industrial metaverse, you ask? Well, we’re not talking about a separate metaverse exclusively for manufacturers..[...]

The Five Questions Every CEO Must Answer About Sustainability

The future of business is green. As a CEO, the ball is in your court to make sustainability an integral part of your corporate strategy.[...]

Debunking The Top 5 Quantum Computing Myths

It makes sense that most people don’t understand quantum computing.[...]

Mastering Teamwork: Top 10 Strategies for Better Collaboration at Work

The nature of teams may be changing as more and more people work remotely, but the truth is businesses will always want people on their teams who can work well with others.[...]

Personalization Pitfalls: 5 Common Mistakes To Avoid For Effective Marketing

Targeted mass marketing was developed by direct mail businesses in the 1960s and 1970s to enable customers to be segmented by age, geography, or income.[...]

Stay up-to-date

  • Get updates straight to your inbox
  • Join my 1 million newsletter subscribers
  • Never miss any new content

Social Media

Yearly Views


View Podcasts