What Are Data Containers and How Are They Used In Practice?
2 July 2021
Many of us have had the experience when we prepare a file – say, an important presentation – on one computer, where it looks and performs beautifully, and then load it up on a different computer only to have it glitch, look strange, or not function at all.
Now imagine that on the scale of a major big data project for a large corporation.
The problem is real; a slightly different version of the application or operating system between development and run can cause big problems, requiring expensive delays and fixes.
And this is where data containers come in.
What Are Data Containers?
A container is an application, including all its dependencies, libraries and other binaries, and the configuration files needed to run it, bundled into a single package that can be moved, in total, from one computing environment to another.
A container might be used when moving from a developer’s laptop to a testing environment, from that testing environment to live production, or even from a physical machine to a virtual machine in the cloud. It can be used to get around differences in operating systems, software versions, infrastructure, security protocols, and storage.
In fact their flexible and portable nature often makes them very well suited for cloud based applications – certainly something which has contributed to their rise in popularity among IT systems architects. Many think that as computing and storage increasingly moves into the cloud, containerisation will become an increasingly important tool.
Data containers are a separate technology from virtualisation, though they are based on some of the same theories. With virtualisation, an entire machine is replicated up to and including the operating system, and can be several gigabytes in size. By contrast, a data container shares an operating system with any other container on the same machine, making the file size only tens of megabytes, and therefore much lighter and more resource friendly.
There is no need for data containers to be provided with virtual memory and system resources in the same way as virtual machines, meaning they consume less processing power when running. They also boot and load faster. While a typical server at a web scale enterprise might be expected to support 10 or 15 virtual machine environments, the same server might run hundreds of containerised applications. Crucially, they are also are far easier to transfer from one environment to another.
Another important distinction is that virtual machines must be provided with dedicated memory and storage resources, while data containers can share. These containers can run on a single operating system, but when users access a container, the container looks and behaves as if it owns the entire operating system. But because containers must be able to interact with the outside world, they can network and share data between containers.
Why should you use data containers?
A data container can be created that allows multiple application containers to access the same data. These application containers can be created, moved, or destroyed without affecting the original data. This gives data held in containers a “stateless” nature, where the data will be identical no matter how many times it is iterated across different operating systems and applications. This is an important development for organisations wanting to run multiple tests or analyses with persistent data. It also eliminates those problems that arise when an entire application is set up in one environment and moved to another.
It’s also this facet of their nature that make them particularly suited for deploying microservices, where large scale applications are built from a number of components, each one being a separate and distinct application in itself. This system of software engineering allows applications to be scaled quickly, by updating existing components or adding new ones while ensuring that the overall integrity of the parent application remains stable.
A notable example of the large scale adoption of containers in a cloud service is provided by Spotify. It recognised the advantages of this technology in late 2013 when it deployed the open source container management platform Docker in order to reduce coding workload and CPU overheads. Google is another large scale user of containers, reportedly launching around two billion every week.
Another advantage of a containerised approach to data is the potential it offers for more comprehensive governance. Laws pertaining to data rights and privacy are in a state of flux and subject to change. Containerised data can be packaged with information regarding who does or does not have the right to access the data, and what purposes it can be used for.
Why might you choose not to use data containers?
Of course, because they share an operating system, data containers are somewhat less secure than virtual machines, and so are unlikely to totally replace virtualisation any time soon. This means that for certain data, containers may not be suitable. great care would have to be taken with storing personal medical or financial information.
An inherent flaw in the concept of containers is that data could possible be leaked through security flaws in the operating system. There is also the possibility that a malicious or inefficiently coded application sharing the same OS could give rise to a security threat.
Due to these inherent disadvantages, many see virtualisation and containerisation as complementary, not competing, technologies. Neither are they mutually exclusive as virtual machines are as capable as any of running containerised applications. The tools required for containerisation are not yet as advanced as those for running VM, but they are certainly gaining ground, meaning it is quickly becoming an efficient and reliable option for forward-thinking application architects.
Related Articles
7 Ways To Turn The ‘Bring Your Own AI’ Threat Into An Opportunity
As AI tools become increasingly accessible, companies face a new trend: BYOAI, or bring your own AI.[...]
AI Gone Wild: How Grok-2 Is Pushing The Boundaries Of Ethics And Innovation
As AI continues to evolve at breakneck speed, Elon Musk's latest creation, Grok-2, is making waves in the tech world.[...]
Apple’s New AI Revolution: Why ‘Apple Intelligence’ Could Change Everything
Apple's announcement of 'Apple Intelligence' marks a seismic shift in how we interact with our devices.[...]
Why AI Models Are Collapsing And What It Means For The Future Of Technology
Artificial intelligence has revolutionized everything from customer service to content creation, giving us tools like ChatGPT and Google Gemini, which can generate human-like text or images with remarkable accuracy.[...]
Where Will Artificial Intelligence Take Us In The Future?
Just a few years back, if you had been told that by 2024, you would be able to have a conversation with a computer that would seem almost completely human, would you have believed it?[...]
AI: Overhyped Fantasy Or Truly The Next Industrial Revolution?
The term “fourth industrial revolution” has been used in recent years to describe the transformative impact that many believe AI and automation will have on human society.[...]
Sign up to Stay in Touch!
Bernard Marr is a world-renowned futurist, influencer and thought leader in the fields of business and technology, with a passion for using technology for the good of humanity.
He is a best-selling author of over 20 books, writes a regular column for Forbes and advises and coaches many of the world’s best-known organisations.
He has a combined following of 4 million people across his social media channels and newsletters and was ranked by LinkedIn as one of the top 5 business influencers in the world.
Bernard’s latest book is ‘Generative AI in Practice’.
Social Media