Internal data is often the first place that companies look when they start to think about analytics and insights. But they shouldn’t overlook the wealth of value that can be gained from mining external and third-party datasets.
Information about your own operations, such as sales transactions and operational performance, can tell you what has happened in the past and help you make educated guesses about what will happen in the future. External data sources can help you understand what your competition is doing, as well as how trends such as consumer behavior patterns, market dynamics or even the weather can impact your performance. In my opinion, an understanding of both is essential today, if you want to make the most of the transformative opportunities offered by data and analytics.
Today, artificial intelligence (AI) and machine learning, fueled by data, is quickly becoming a hugely transformative force in many industries and markets. However, not every company has the resources of an Amazon or a Walmart that allow it to generate vast amounts of proprietary, internal data from a customer base of millions. Fortunately, external data can be just as useful and has the advantage of being readily available to just about anyone.
During the Covid-19 pandemic, rapidly changing behavior meant that many of the existing models used by businesses to predict demand or forecast change became obsolete overnight. A large amount of their internal data now held little use. During this period, companies often found that external data held the key to building new models to predict how people would react to changing circumstances. Data on internet search traffic was particularly valuable for everything from tracking the spread of the virus to predicting where behavior changes would be most severe, to understanding what people’s new priorities were in a changing world.
External datasets may be publicly available – for example, many governments make a wide range of information available through portals such as data.gov and data.gov.uk. Alternatively, they might be privately held, and either made available for free (for example, Google's basic search and trends data services) or for a cost. Companies such as Nielsen and Experian provide marketing and demographic data from a huge range of sources, and niche providers have emerged carrying specialist datasets of value to many different industries.
When one US glass manufacturer was looking to diversify its revenue streams, it found that it was able to predict where window repairs were most likely to be needed by analyzing publicly available crime data. By streamlining its supply chain and prepping mobile repair units, it was able to quickly build a profitable new business unit providing emergency repairs. On an industry-wide scale, finance and credit card companies have long used external data from credit reference agencies to assess the risks of lending to individual customers. And real estate businesses use public databases of property sales to estimate the value of houses they buy, sell and lease.
Special mention should also be given to the role that external data plays in the transformative power of the “digital twin." This is a simulated version of a business, a product, or a process that can be used to predict how different variables will affect its performance in the real world. While the “twin” model is generally built using internal data, external data can be used to simulate the “world” that the twin exists in. For example, Goodyear creates simulated versions of its tires using data from its manufacturing processes. It then uses external data on the structure and condition of road surfaces, as well as weather data, to create realistic environments that can be used to predict the performance of new tire prototypes.
Of course, there's no such thing as a free lunch, and there are challenges when it comes to working with external data, even if it's provided at no cost. One is that as you don’t have direct control of the means of capturing the data, you might find yourself overly reliant on the data provider, which might vanish or drastically alter its methods of operating at any time. If you’ve used resources to build analytics tools around these services, and they suddenly aren't available anymore, this could be a problem.
As well as that, there can be technical issues. Working with multiple datasets from different providers means you have to make sure your data is in a format that can be easily correlated and merged with each other. The most valuable insights often come from combining two or more different and entirely separate datasets. A data engineering or cleansing job is often necessary to get it all into a state where this is possible.
Finally, bear in mind that as you may require many different data sources, which could range from satellite imagery and meteorological data to anonymized customer data, you may have to build and maintain relationships with several different data suppliers. This brings compliance issues, as you will always have to be sure that the data you’re buying has been collected and processed in a lawful and ethical way. The amount of compliance and regulation around the use of data is becoming greater by the day, and as a data processor, you may very well find yourself facing a potentially expensive penalty if your suppliers aren’t all above-board.
If your organization is capable of putting plans and strategies in place to manage all of that, then working with external data has the potential to be extremely rewarding. It means your data and analytics strategy is no longer only about "you" but becomes about building an awareness of the environments and ecosystems around your business in which it operates. It can allow you to streamline and drive efficiencies throughout your existing business models or even transform in a way that allows you to create entirely new ones.