Big Data: R Explained in less than two minutes, to absolutely anyone
2 July 2021
If you’re looking at ways you can harness the power of Big Data analytics in your business, but are not necessarily a techie person yourself, it can be a confusing field at first.
For this reason I’m publishing a series of short posts aimed at explaining some of the key concepts and technologies behind Big Data and data analytics, aimed at an audience which is not primarily composed of IT specialists or data scientists.
I firmly believe that any business can benefit from the new wave of analytics applications and services which can crunch through as much data as you can throw at them, in order to come out with surprising and valuable insights to drive growth.
These projects usually require a mix of skills, and communication between people with different skillsets (i.e data science and marketing) is essential. So in this post I’ll give an overview of R -the programming language favored by many statisticians.
R is a computer programming language which is particularly well suited to handling and sorting the large datasets associated with Big Data projects.
Like Python which I covered previously, the software environment used to create code in R is open sourced, meaning it is free to download, anyone can use it, and there is a plethora of guidance and advice available on how to use it most effectively. However commercial distributions are also available, which often offer additional proprietary functionality or support packages.
Named from the initials of the two men who first developed the language at the University of Auckland, Robert Gentleman and Ross Ihaka, R has become very popular in recent years and is continuing to become more so, due to the explosion in analytic activities being carried out by business.
R’s strengths as a statistical programming language draw from the fact it is designed from the ground up to facilitate matrix arithmetic – carrying out complex, often automated calculations on data which is held in a grid of rows and columns. R is very good for creating programs which can carry out calculations on these datasets, even when the datasets are constantly growing in size at an ever-increasing rate, and producing real-time visualizations based on this data.
Its capability at producing these visualizations is another core strength of R. Its designers realised that visualization was key to being able to understand the complex datasets that are being explored, so incorporated functionality to translate data into charts, graphs and complex multi-dimensioned matrices – as well as many user-defined methods of visualization – into its core.
Online, R code is everywhere although you won’t see it, as it’s always hidden behind pretty graphical interfaces. But when you use Google, Facebook or Twitter you are almost certainly executing R code running on the servers of those organizations. In fact it is often cited as the most widely used programming language for data science. APIs exist for almost all of these services, allowing applications written in R to access data from these outside sources and include it in their own analytics routines.
Thanks to this huge user base, just about every function that you might need for data analysis is available, often through open source extensions (known as packages) made available by the community. It is also capable of executing code written in other languages such as C++ or Java, so resources coded in those languages can be made available. Because it can be compiled to run on any major operating system, R code can easily be ported between Unix, Windows or Mac environments.
Python is probably R’s biggest rival – but as both are non-commercial entities (as are most languages, computer or otherwise!) it’s not necessarily a rivalry in the traditional sense. However coders will often argue vociferously for their favorite of the two. Python, having more in common with more traditional, longer established programming languages, is often cited as being easier to learn, particularly for someone with prior experience of different high-level programming languages. The R environment, on the other hand, is likely to be more familiar to someone with an academic background in statistics. It’s worth noting that Python tends to have a wider range of uses outside of the world of statistics and analytics, whereas R is generally exclusively used for those purposes.
With a reported two million users worldwide, and thousands of deployed applications created using it, R is undoubtedly one of the backbone technologies of the Big Data revolution. If you are thinking of getting involved with the techie end of data analysis, then a thorough grounding in the language should be considered an essential element of your toolbox. If you want to learn more, or have a go at creating your own code in R to see what it can do, there are plenty of great resources online, such as those at Coursera, Code School and R Studio .
Related Articles
Will AI Solve The World’s Inequality Problem – Or Make It Worse?
We are standing on the cusp of a new technological revolution. AI is increasingly permeating every aspect of our lives, with intelligent machines transforming the way we live and work.[...]
How You Become Irreplaceable In The Age Of AI
In a world where artificial intelligence is rapidly advancing, many of us are left wondering: Will AI take our jobs?[...]
Why Apple Intelligence Sets A New Gold Standard For AI Privacy
In the rapidly evolving world of artificial intelligence, privacy concerns have become a hot-button issue.[...]
Can Your Device Run Apple Intelligence? What You Need To Know
Apple's announcement of Apple Intelligence has sent waves of excitement through the tech world.[...]
10 Amazing Things You Can Do With Apple Intelligence On Your IPhone
Apple Intelligence is poised to revolutionize the iPhone experience, offering a suite of AI-powered tools that promise to make your digital life easier, more productive, and more creative.[...]
Agentic AI: The Next Big Breakthrough That’s Transforming Business And Technology
The world of artificial intelligence is evolving at a breakneck pace, and just when you thought you'd wrapped your head around generative AI, along comes another game-changing concept: agentic AI.[...]
Sign up to Stay in Touch!
Bernard Marr is a world-renowned futurist, influencer and thought leader in the fields of business and technology, with a passion for using technology for the good of humanity.
He is a best-selling author of over 20 books, writes a regular column for Forbes and advises and coaches many of the world’s best-known organisations.
He has a combined following of 4 million people across his social media channels and newsletters and was ranked by LinkedIn as one of the top 5 business influencers in the world.
Bernard’s latest book is ‘Generative AI in Practice’.
Social Media