Written by

Bernard Marr

Bernard Marr is a world-renowned futurist, influencer and thought leader in the fields of business and technology, with a passion for using technology for the good of humanity. He is a best-selling author of 20 books, writes a regular column for Forbes and advises and coaches many of the world’s best-known organisations. He has over 2 million social media followers, 1 million newsletter subscribers and was ranked by LinkedIn as one of the top 5 business influencers in the world and the No 1 influencer in the UK.

Bernard’s latest book is ‘Business Trends in Practice: The 25+ Trends That Are Redefining Organisations’

View Latest Book

Follow Me

Bernard Marr ist ein weltbekannter Futurist, Influencer und Vordenker in den Bereichen Wirtschaft und Technologie mit einer Leidenschaft für den Einsatz von Technologie zum Wohle der Menschheit. Er ist Bestsellerautor von 20 Büchern, schreibt eine regelmäßige Kolumne für Forbes und berät und coacht viele der weltweit bekanntesten Organisationen. Er hat über 2 Millionen Social-Media-Follower, 1 Million Newsletter-Abonnenten und wurde von LinkedIn als einer der Top-5-Business-Influencer der Welt und von Xing als Top Mind 2021 ausgezeichnet.

Bernards neueste Bücher sind ‘Künstliche Intelligenz im Unternehmen: Innovative Anwendungen in 50 Erfolgreichen Unternehmen’

View Latest Book

Follow Me

What is Kafka? A super-simple explanation of this important data analytics tool

2 July 2021

Hadoop, Spark, Tensorflow, Python – the number of platforms, frameworks and technologies which have emerged to help us handle and learn from the ever-growing amount of data available to businesses can be overwhelming. In this post I am going to take a look at Kafka – a data processing engine specifically designed for the high-speed, real-time information processing which makes AI and Big Data possible.

What is Kafka?

Kafka is an open source software which provides a framework for storing, reading and analysing streaming data.

Being open source means that it is essentially free to use and has a large network of users and developers who contribute towards updates, new features and offering support for new users.

Kafka is designed to be run in a “distributed” environment, which means that rather than sitting on one user’s computer, it runs across several (or many) servers, leveraging the additional processing power and storage capacity that this brings.

Kafka was originally created at LinkedIn, where it played a part in analysing the connections between their millions of professional users in order to build networks between people. It was given open source status and passed to the Apache Foundation – which coordinates and oversees development of open source software – in 2011.

What is Kafka used for?

In order to stay competitive, businesses today rely increasingly on real-time data analysis allowing them to gain faster insights and quicker response times. Real-time insights allow businesses or organisations to make predictions about what they should stock, promote, or pull from the shelves, based on the most up-to-date information possible.

Traditionally, data has been processed and transmitted across networks in “batches”. This is down to limitations in the pipeline – the speed at which CPUs can handle the calculations involved in reading and transferring information, or at which sensors can detect data. As this interview points out, these “bottlenecks” in our ability to process data have existed since humans first began to record and exchange information in written records.

Due to its distributed nature and the streamlined way it manages incoming data, Kafka is capable of operating very quickly – large clusters can be capable of monitoring and reacting to millions of changes to a dataset every second. This means it becomes possible to start working with – and reacting to – streaming data in real-time.

Kafka was originally designed to track the behaviour of visitors to large, busy websites (such as LinkedIn). By analysing the clickstream data (how the user navigates the site and what functionality they use) of every session, a greater understanding of user behaviour is achievable. This makes it possible to predict which news articles, or products for sale, a visitor might be interested in.

Since then, Kafka has become widely used, and it is an integral part of the stack at Spotify, Netflix, Uber, Goldman Sachs, Paypal and CloudFlare, which all use it to process streaming data and understand customer, or system, behaviour. In fact, according to their website, one out of five Fortune 500 businesses uses Kafka to some extent.

One particular niche where Kafka has gained dominance is the travel industry, where its streaming capability makes it ideal for tracking booking details of millions of flights, package holidays and hotel vacancies worldwide.

How does Kafka work?

Apache takes information – which can be read from a huge number of data sources – and organises it into “topics”. As a very simple example, one of these data sources could be a transactional log where a grocery store records every sale.

Kafka would process this stream of information and make “topics” – which could be “number of apples sold”, or “number of sales between 1pm and 2pm” which could be analysed by anyone needing insights into the data.

This may sound similar to how a conventional database lets you store or sort information, but in the case of Kafka it would be suitable for a national chain of grocery stores processing thousands of apple sales every minute.

This is achieved using a function known as a Producer, which is an interface between applications (e.g. the software which is monitoring the grocery stores structured but unsorted transaction database) and the topics – Kafka’s own database of ordered, segmented data, known as the Kafka Topic Log.

Often this data stream will be used to fill data lakes such as Hadoop’s distributed databases or to feed real-time processing pipelines like Spark or Storm.

Another interface – known as the Consumer – enables topic logs to be read, and the information stored in them passed onto other applications which might need it – for example, the grocery store’s system for renewing depleted stock, or discarding out-of-date items.

When you put its components together with the other common elements of a Big Data analytics framework, Kafka works by forming the “central nervous system” that the data passes through input and capture applications, data processing engines and storage lakes.

Hopefully this article serves to give an overview of how, where and why Kafka is used, and some of the factors which have supported its huge growth in popularity. If you want more in-depth details about how it works, as well as information on how to get started using it yourself, there are some great resources online:

Business Trends In Practice | Bernard Marr
Business Trends In Practice | Bernard Marr

Related Articles

The Best Smart Watches In 2023 / 2024: From Blood Sugar Monitoring To AI Personal Training

Apple popularized the smartwatch, just as it did with the smartphone when it released the Apple watch in 2015.[...]

The Amazing Ways Snowflake Uses Generative AI For Synthetic Data And Natural Language Queries

You probably know that the new generation of generative AI tools that have exploded onto the scene can generate words, pictures and even videos that closely resemble those created by humans.[...]

The Role of Data Storage in Accelerating Time-to-Insights

When it comes to data and analytics, time is money. According to research by IDC, 75 percent of business decision-makers believe data loses its value within days.[...]

6 Roadblocks Stopping Web3 And The Metaverse Becoming A Reality

With the emergence of the metaverse and web3 technologies, it’s clear that the next evolution of the internet is already underway.[...]

The Future Of Factories: 3 Ways To Navigate The Industrial Metaverse

What is the industrial metaverse, you ask? Well, we’re not talking about a separate metaverse exclusively for manufacturers..[...]

The Five Questions Every CEO Must Answer About Sustainability

The future of business is green. As a CEO, the ball is in your court to make sustainability an integral part of your corporate strategy.[...]

Stay up-to-date

  • Get updates straight to your inbox
  • Join my 1 million newsletter subscribers
  • Never miss any new content

Social Media

Yearly Views


View Podcasts