Unless you have the resources for building and maintaining large amounts of IT infrastructure, the best place for most organizations’ Big Data these days is in the cloud.
Using cloud services for your data storage, and increasingly also your analytics and compute, means you are essentially “outsourcing” a lot of the hassle that comes along with storing and managing large amounts of data. Issues such as space, power usage, networking infrastructure, and security become the problem of your cloud service provider, and they are generally well-equipped to deal with them.
Another big advantage of using cloud solutions is that they can be highly scalable. Most offer plans that let you start small, then increase the amount of capacity available for storing data as your demand grows. The big providers also all offer bolt-on services that can take care of your AI, analytics, and data visualization needs without your valuable data ever leaving the safety of the cloud.
Amazon Web Services S3
It makes sense to start with the daddy of corporate cloud service providers. Amazon launched its first platform-as-a-service offering way back in 2006, and it has acted as the model for pretty much every other cloud storage and computing service ever since. In the same year, it also launched Elastic Cloud Compute (EC2), a compute platform that provides virtualized data-processing services that can be quickly scaled up or scaled down as your needs change. Its data lake service goes by the name of Amazon Simple Storage Service (S3) and is used by millions of companies and organizations around the world.
AWS has continued to be the most popular cloud storage solution for big data operations, generating close to $10 billion in revenue for the tech giant in the last quarter of 2019, even as competitors raced to both catch up and add new features to their own services.
Microsoft Azure Data Lake
Microsoft’s competitor to AWS launched a bit later in 2010 but quickly grew to offer a full suite of tools and services, designed to allow organizations that work with large datasets to carry out all of their operations in the cloud.
Microsoft has experience of running some of the largest-scale processing and analytics operations in the world, including it’s own Office 360, Skype and Xbox Live. A strength is the enterprise-grade security and governance as well as the integration with advanced analytics tools.
Azure’s suite of services includes Azure Data Lake, which is specifically built to handle the requirements of businesses and organizations with complex data needs. Data is stored in a data lake in its native format – unprocessed and without the need to fit to a standard schema that can be applied to all of the other data.
Google Cloud Storage
Google’s cloud platform is built on the same technology that powers its own Big Data-driven services like Youtube and Google Search, with all the scalability and reliability that this implies. It also offers a number of storage and data lake-oriented services under the banner of Google Cloud Storage, designed to be scalable to handle exabytes of data. Different pricing plans are used for different datasets, depending on how frequently they are accessed, so data that is essentially just “backup” and doesn’t need to be accessed by-the-second can be archived to lower your storage cost. You can also choose where in the world it is stored, which will impact access times and ensure it can be served up where it is needed while eliminating costs associated with storing it at locations where it isn’t needed. As an added benefit, if you are keen on keeping your carbon emissions down, all of Google’s data storage solutions have generated zero net carbon emissions since 2007.
Oracle’s well-established database platform is available to businesses through its Oracle Cloud service, offering flexible, scalable storage along with its suite of cloud-based analytics and data processing services. The service is highly rated for its strong security features, including real-time encryption of all data sent to the platform. The platform itself uses Oracle’s own proprietary advanced machine learning processes to help automate many of the data operations you might want to carry out, as well as to reduce errors caused by manual data entry.
IBM offers a number of different data lake solutions depending on your needs, all centralized around its IBM Cloud (formerly Bluemix) platform. Like the other solutions mentioned here, you can start small (even with a free tier) and scale up as you begin to generate and store larger amounts of data. With IBM’s platform, users choose between object storage, block storage, or file storage, depending on the data structures they are working with. IBM offers “cognitive” analytical tools in the form of its Watson AI platform that can fully integrate with data stored on IBM cloud services.
Alibaba Cloud (formerly Aliyun) is not (yet) as popular in western nations as the “Big 3” of Google, AWS, and Microsoft, but it’s certainly a growing presence. As the leading Big Data cloud service provider in China, however, it has a huge userbase in Asia and provides the same range of analytics, security, and AI tools as the US-based platforms. It offers pay-as-you-go as well as monthly subscription models. Reviews suggest that the services offered to customers in the US and Europe may lack some of the polish of Ali’s Silicon Valley rivals, but pricing is highly competitive.
Read more about key technology trends in my new book, Tech Trends in Practice: The 25 Technologies That Are Driving The 4th Industrial Revolution.