Written by

Bernard Marr

Bernard Marr is a world-renowned futurist, influencer and thought leader in the fields of business and technology, with a passion for using technology for the good of humanity. He is a best-selling author of over 20 books, writes a regular column for Forbes and advises and coaches many of the world’s best-known organisations. He has a combined following of 4 million people across his social media channels and newsletters and was ranked by LinkedIn as one of the top 5 business influencers in the world.

Bernard’s latest books are ‘Future Skills’, ‘The Future Internet’, ‘Business Trends in Practice’ and ‘Generative AI in Practice’.

Generative AI Book Launch
View My Latest Books

Follow Me

Bernard Marr ist ein weltbekannter Futurist, Influencer und Vordenker in den Bereichen Wirtschaft und Technologie mit einer Leidenschaft für den Einsatz von Technologie zum Wohle der Menschheit. Er ist Bestsellerautor von 20 Büchern, schreibt eine regelmäßige Kolumne für Forbes und berät und coacht viele der weltweit bekanntesten Organisationen. Er hat über 2 Millionen Social-Media-Follower, 1 Million Newsletter-Abonnenten und wurde von LinkedIn als einer der Top-5-Business-Influencer der Welt und von Xing als Top Mind 2021 ausgezeichnet.

Bernards neueste Bücher sind ‘Künstliche Intelligenz im Unternehmen: Innovative Anwendungen in 50 Erfolgreichen Unternehmen’

View Latest Book

Follow Me

What is Hadoop?

2 July 2021

When you learn about Big Data you will sooner or later come across this odd sounding word: Hadoop – but what exactly is it?

Put simply, Hadoop can be thought of as a set of open source programs and procedures (meaning essentially they are free for anyone to use or modify, with a few exceptions) which anyone can use as the “backbone” of their big data operations.

I’ll try to keep things simple as I know a lot of people reading this aren’t software engineers, so I hope I don’t over-simplify anything – think of this as a brief guide for someone who wants to know a bit more about the nuts and bolts that make big data analysis possible.






The 4 Modules of Hadoop

Hadoop is made up of “modules”, each of which carries out a particular task essential for a computer system designed for big data analytics.


1. Distributed File-System

The most important two are the Distributed File System, which allows data to be stored in an easily accessible format, across a large number of linked storage devices, and the MapReduce – which provides the basic tools for poking around in the data.

(A “file system” is the method used by a computer to store data, so it can be found and used. Normally this is determined by the computer’s operating system, however a Hadoop system uses its own file system which sits “above” the file system of the host computer – meaning it can be accessed using any computer running any supported OS).


2. MapReduce

MapReduce is named after the two basic operations this module carries out – reading data from the database, putting it into a format suitable for analysis (map), and performing mathematical operations i.e counting the number of males aged 30+ in a customer database (reduce).


3. Hadoop Common

The other module is Hadoop Common, which provides the tools (in Java) needed for the user’s computer systems (Windows, Unix or whatever) to read data stored under the Hadoop file system.


4. YARN

The final module is YARN, which manages resources of the systems storing the data and running the analysis.

Various other procedures, libraries or features have come to be considered part of the Hadoop “framework” over recent years, but Hadoop Distributed File System, Hadoop MapReduce, Hadoop Common and Hadoop YARN are the principle four.


How Hadoop Came About

Development of Hadoop began when forward-thinking software engineers realised that it was quickly becoming useful for anybody to be able to store and analyze datasets far larger than can practically be stored and accessed on one physical storage device (such as a hard disk).

This is partly because as physical storage devices become bigger it takes longer for the component that reads the data from the disk (which in a hard disk, would be the “head”) to move to a specified segment. Instead, many smaller devices working in parallel are more efficient than one large one.

It was released in 2005 by the Apache Software Foundation, a non-profit organization which produces open source software which powers much of the Internet behind the scenes. And if you’re wondering where the odd name came from, it was the name given to a toy elephant belonging to the son of one of the original creators!


The Usage of Hadoop

The flexible nature of a Hadoop system means companies can add to or modify their data system as their needs change, using cheap and readily-available parts from any IT vendor.

Today, it is the most widely used system for providing data storage and processing across “commodity” hardware – relatively inexpensive, off-the-shelf systems linked together, as opposed to expensive, bespoke systems custom-made for the job in hand. In fact it is claimed that more than half of the companies in the Fortune 500 make use of it.

Just about all of the big online names use it, and as anyone is free to alter it for their own purposes, modifications made to the software by expert engineers at, for example, Amazon and Google, are fed back to the development community, where they are often used to improve the “official” product. This form of collaborative development between volunteer and commercial users is a key feature of open source software.

In its “raw” state – using the basic modules supplied here https://hadoop.apache.org/ by Apache, it can be very complex, even for IT professionals – which is why various commercial versions have been developed such as Cloudera which simplify the task of installing and running a Hadoop system, as well as offering training and support services.

So that, in a (fairly large) nutshell, is Hadoop. Thanks to the flexible nature of the system, companies can expand and adjust their data analysis operations as their business expands. And the support and enthusiasm of the open source community behind it has led to great strides towards making big data analysis more accessible for everyone.

Where to go from here

If you would like to know more about Hadoop or other big data tools, check out my articles on:

Or browse the Big Data section of this site to find more articles and many practical examples.

Business Trends In Practice | Bernard Marr
Business Trends In Practice | Bernard Marr

Related Articles

The Future Of Medicine: How AI is Shaping Patient Care And Drug Discovery

One of the most exciting aspects of AI is its implications for healthcare. Today, doctors and other medical professionals routinely augment their human skills and experience with the help of intelligent machines.[...]

Navigating The Future: 10 Global Trends That Will Define 2024

We’re approaching the mid-point of a decade in which we’ve already seen significant global transformation.[...]

Unlocking The Future Of Learning: How XR Tech Transforms Education

In the metaverse era, education as we know it will change. And I’m not just talking about formal education in schools, colleges, and universities – but also workplace learning and lifelong learning.[...]

2024 IoT And Smart Device Trends: What You Need to Know For The Future

By the end of 2024, there are projected to be more than 207 billion devices connected to the worldwide network of tools, toys, devices and appliances that make up the Internet of Things (IoT).[...]

The Evolving Internet: Navigating Risks Amidst Immersion, Decentralization, And Generative AI

The future internet is on the horizon, promising unprecedented engagement and innovation. Yet, as we incorporate immersive tech, decentralized systems, and generative AI, we also invite new complexities.[...]

The 8 Biggest Future Of Work Trends In 2024 Everyone Needs To Be Ready For Now

The world of work is constantly changing. Concepts that our parents or grandparents grew up with, such as the nine-to-five office and the job-for-life, are being consigned to the past.[...]

Sign up to Stay in Touch!

Bernard Marr is a world-renowned futurist, influencer and thought leader in the fields of business and technology, with a passion for using technology for the good of humanity.

He is a best-selling author of over 20 books, writes a regular column for Forbes and advises and coaches many of the world’s best-known organisations.

He has a combined following of 4 million people across his social media channels and newsletters and was ranked by LinkedIn as one of the top 5 business influencers in the world.

Bernard’s latest book is ‘Generative AI in Practice’.

Sign Up Today

Social Media

0
Followers
0
Followers
0
Followers
0
Subscribers
0
Followers
0
Subscribers
0
Yearly Views
0
Readers

Podcasts

View Podcasts