Written by

Bernard Marr

Bernard Marr is a world-renowned futurist, influencer and thought leader in the fields of business and technology, with a passion for using technology for the good of humanity. He is a best-selling author of over 20 books, writes a regular column for Forbes and advises and coaches many of the world’s best-known organisations. He has a combined following of 4 million people across his social media channels and newsletters and was ranked by LinkedIn as one of the top 5 business influencers in the world.

Bernard’s latest books are ‘Future Skills’, ‘The Future Internet’, ‘Business Trends in Practice’ and ‘Generative AI in Practice’.

Generative AI Book Launch
View My Latest Books

Follow Me

Bernard Marr ist ein weltbekannter Futurist, Influencer und Vordenker in den Bereichen Wirtschaft und Technologie mit einer Leidenschaft für den Einsatz von Technologie zum Wohle der Menschheit. Er ist Bestsellerautor von 20 Büchern, schreibt eine regelmäßige Kolumne für Forbes und berät und coacht viele der weltweit bekanntesten Organisationen. Er hat über 2 Millionen Social-Media-Follower, 1 Million Newsletter-Abonnenten und wurde von LinkedIn als einer der Top-5-Business-Influencer der Welt und von Xing als Top Mind 2021 ausgezeichnet.

Bernards neueste Bücher sind ‘Künstliche Intelligenz im Unternehmen: Innovative Anwendungen in 50 Erfolgreichen Unternehmen’

View Latest Book

Follow Me

What is Spark in Big Data?

2 July 2021

Basically Spark is a framework – in the same way that Hadoop is – which provides a number of inter-connected platforms, systems and standards for Big Data projects.




Like Hadoop, Spark is open-source and under the wing of the Apache Software Foundation. Essentially, open-source means the code can be freely used by anyone. Beyond that, it can also be altered by anyone to produce custom versions aimed at particular problems, or industries. Volunteer developers, as well as those working at companies which produce custom versions, constantly refine and update the core software adding more features and efficiencies. In fact Spark was the most active project at Apache last year. It was also the most active of all of the open source Big Data applications, with over 500 contributors from more than 200 organizations.

Spark is seen by techies in the industry as a more advanced product than Hadoop – it is newer, and designed to work by processing data in chunks “in memory”. This means it transfers data from the physical, magnetic hard discs into far-faster electronic memory where processing can be carried out far more quickly – up to 100 times faster in some operations.

Spark has proven very popular and is used by many large companies for huge, multi-petabyte data storage and analysis. This has partly been because of its speed. Last year, Spark set a world record by completing a benchmark test involving sorting 100 terabytes of data in 23 minutes – the previous world record of 71 minutes being held by Hadoop.

Additionally, Spark has proven itself to be highly suited to Machine Learning applications. Machine Learning is one of the fastest growing and most exciting areas of computer science, where computers are being taught to spot patterns in data, and adapt their behaviour based on automated modelling and analysis of whatever task they are trying to perform.

It is designed from the ground up to be easy to install and use – if you have a background in computer science! In order to make it available to more businesses, many vendors provide their own versions (as with Hadoop) which are geared towards particular industries, or custom-configured for individual clients’ projects, as well as associated consultancy services to get it up and running.

Spark uses cluster computing for its computational (analytics) power as well as its storage. This means it can use resources from many computer processors linked together for its analytics. It’s a scalable solution meaning that if more oomph is needed, you can simply introduce more processors into the system. With distributed storage, the huge datasets gathered for Big Data analysis can be stored across many smaller individual physical hard discs. This speeds up read/write operations, because the “head” which reads information from the discs has less physical distance to travel over the disc surface. As with processing power, more storage can be added when needed, and the fact it uses commonly available commodity hardware (any standard computer hard discs) keeps down infrastructure costs.

Unlike Hadoop, Spark does not come with its own file system – instead it can be integrated with many file systems including Hadoop’s HDFS, MongoDB and Amazon’s S3 system.

Another element of the framework is Spark Streaming, which allows applications to be developed which perform analytics on streaming, real-time data – such as automatically analyzing video or social media data – on-the-fly, in real-time.

In fast changing industries such as marketing, real-time analytics has huge advantages, for example ads can be served based on a user’s behavior at a particular time, rather than on historical behavior, increasing the chance of prompting an impulse purchase.

So that’s a brief introduction to Apache Spark – what it is, how it works, and why a lot of people think that it’s the future. I hope you found it useful.

Business Trends In Practice | Bernard Marr
Business Trends In Practice | Bernard Marr

Related Articles

Will AI Solve The World’s Inequality Problem – Or Make It Worse?

We are standing on the cusp of a new technological revolution. AI is increasingly permeating every aspect of our lives, with intelligent machines transforming the way we live and work.[...]

How You Become Irreplaceable In The Age Of AI

In a world where artificial intelligence is rapidly advancing, many of us are left wondering: Will AI take our jobs?[...]

Why Apple Intelligence Sets A New Gold Standard For AI Privacy

In the rapidly evolving world of artificial intelligence, privacy concerns have become a hot-button issue.[...]

Can Your Device Run Apple Intelligence? What You Need To Know

Apple's announcement of Apple Intelligence has sent waves of excitement through the tech world.[...]

10 Amazing Things You Can Do With Apple Intelligence On Your IPhone

Apple Intelligence is poised to revolutionize the iPhone experience, offering a suite of AI-powered tools that promise to make your digital life easier, more productive, and more creative.[...]

Agentic AI: The Next Big Breakthrough That’s Transforming Business And Technology

The world of artificial intelligence is evolving at a breakneck pace, and just when you thought you'd wrapped your head around generative AI, along comes another game-changing concept: agentic AI.[...]

Sign up to Stay in Touch!

Bernard Marr is a world-renowned futurist, influencer and thought leader in the fields of business and technology, with a passion for using technology for the good of humanity.

He is a best-selling author of over 20 books, writes a regular column for Forbes and advises and coaches many of the world’s best-known organisations.

He has a combined following of 4 million people across his social media channels and newsletters and was ranked by LinkedIn as one of the top 5 business influencers in the world.

Bernard’s latest book is ‘Generative AI in Practice’.

Sign Up Today

Social Media

0
Followers
0
Followers
0
Followers
0
Subscribers
0
Followers
0
Subscribers
0
Yearly Views
0
Readers

Podcasts

View Podcasts