Big Data: R Explained in less than two minutes, to absolutely anyone
2 July 2021
If you’re looking at ways you can harness the power of Big Data analytics in your business, but are not necessarily a techie person yourself, it can be a confusing field at first.
For this reason I’m publishing a series of short posts aimed at explaining some of the key concepts and technologies behind Big Data and data analytics, aimed at an audience which is not primarily composed of IT specialists or data scientists.
I firmly believe that any business can benefit from the new wave of analytics applications and services which can crunch through as much data as you can throw at them, in order to come out with surprising and valuable insights to drive growth.
These projects usually require a mix of skills, and communication between people with different skillsets (i.e data science and marketing) is essential. So in this post I’ll give an overview of R -the programming language favored by many statisticians.
R is a computer programming language which is particularly well suited to handling and sorting the large datasets associated with Big Data projects.
Like Python which I covered previously, the software environment used to create code in R is open sourced, meaning it is free to download, anyone can use it, and there is a plethora of guidance and advice available on how to use it most effectively. However commercial distributions are also available, which often offer additional proprietary functionality or support packages.
Named from the initials of the two men who first developed the language at the University of Auckland, Robert Gentleman and Ross Ihaka, R has become very popular in recent years and is continuing to become more so, due to the explosion in analytic activities being carried out by business.
R’s strengths as a statistical programming language draw from the fact it is designed from the ground up to facilitate matrix arithmetic – carrying out complex, often automated calculations on data which is held in a grid of rows and columns. R is very good for creating programs which can carry out calculations on these datasets, even when the datasets are constantly growing in size at an ever-increasing rate, and producing real-time visualizations based on this data.
Its capability at producing these visualizations is another core strength of R. Its designers realised that visualization was key to being able to understand the complex datasets that are being explored, so incorporated functionality to translate data into charts, graphs and complex multi-dimensioned matrices – as well as many user-defined methods of visualization – into its core.
Online, R code is everywhere although you won’t see it, as it’s always hidden behind pretty graphical interfaces. But when you use Google, Facebook or Twitter you are almost certainly executing R code running on the servers of those organizations. In fact it is often cited as the most widely used programming language for data science. APIs exist for almost all of these services, allowing applications written in R to access data from these outside sources and include it in their own analytics routines.
Thanks to this huge user base, just about every function that you might need for data analysis is available, often through open source extensions (known as packages) made available by the community. It is also capable of executing code written in other languages such as C++ or Java, so resources coded in those languages can be made available. Because it can be compiled to run on any major operating system, R code can easily be ported between Unix, Windows or Mac environments.
Python is probably R’s biggest rival – but as both are non-commercial entities (as are most languages, computer or otherwise!) it’s not necessarily a rivalry in the traditional sense. However coders will often argue vociferously for their favorite of the two. Python, having more in common with more traditional, longer established programming languages, is often cited as being easier to learn, particularly for someone with prior experience of different high-level programming languages. The R environment, on the other hand, is likely to be more familiar to someone with an academic background in statistics. It’s worth noting that Python tends to have a wider range of uses outside of the world of statistics and analytics, whereas R is generally exclusively used for those purposes.
With a reported two million users worldwide, and thousands of deployed applications created using it, R is undoubtedly one of the backbone technologies of the Big Data revolution. If you are thinking of getting involved with the techie end of data analysis, then a thorough grounding in the language should be considered an essential element of your toolbox. If you want to learn more, or have a go at creating your own code in R to see what it can do, there are plenty of great resources online, such as those at Coursera, Code School and R Studio .
Related Articles
AI Gone Wild: How Grok-2 Is Pushing The Boundaries Of Ethics And Innovation
As AI continues to evolve at breakneck speed, Elon Musk's latest creation, Grok-2, is making waves in the tech world.[...]
Apple’s New AI Revolution: Why ‘Apple Intelligence’ Could Change Everything
Apple's announcement of 'Apple Intelligence' marks a seismic shift in how we interact with our devices.[...]
Why AI Models Are Collapsing And What It Means For The Future Of Technology
Artificial intelligence has revolutionized everything from customer service to content creation, giving us tools like ChatGPT and Google Gemini, which can generate human-like text or images with remarkable accuracy.[...]
Where Will Artificial Intelligence Take Us In The Future?
Just a few years back, if you had been told that by 2024, you would be able to have a conversation with a computer that would seem almost completely human, would you have believed it?[...]
AI: Overhyped Fantasy Or Truly The Next Industrial Revolution?
The term “fourth industrial revolution” has been used in recent years to describe the transformative impact that many believe AI and automation will have on human society.[...]
The World On Edge: 5 Global Mega Threats That Could Reshape Our Future
In an era of unprecedented global interconnectedness, humanity faces a perfect storm of challenges that threaten to reshape our world.[...]
Sign up to Stay in Touch!
Bernard Marr is a world-renowned futurist, influencer and thought leader in the fields of business and technology, with a passion for using technology for the good of humanity.
He is a best-selling author of over 20 books, writes a regular column for Forbes and advises and coaches many of the world’s best-known organisations.
He has a combined following of 4 million people across his social media channels and newsletters and was ranked by LinkedIn as one of the top 5 business influencers in the world.
Bernard’s latest book is ‘Generative AI in Practice’.
Social Media