In business today, data is understood to be the key to improving every aspect of how we plan, administer, design, build, sell and look after our customers. We are creating, storing, and using more data than ever before. And advanced data technologies, including artificial intelligence (AI) and the internet of things (IoT) are reshaping every industry.
Managing this data involves taking on a great deal of responsibility. Businesses that rely on processing personal data – often the most valuable type of data – have to comply with stringent data protection and privacy regulations. Even if you work solely with anonymized or non-personal data, a great deal of care has to be taken to ensure it’s relevant, clean, up-to-date, and free of bias.
Putting checks and measures in place to ensure you’re on top of this, and that you have a streamlined process for confirming you are compliant, ethical, and safe in the way you handle data is known as data governance. In today’s business environment, where data is often among a company's most valuable assets and is often the decider between success and failure, it’s an important element of any strategy.
Broadly speaking, a data governance strategy should aim to ensure you are upholding three principles: Firstly, that you aren't breaking any laws; secondly, that you are acting ethically; and thirdly, that you aren't doing anything that is going to damage the trust that people put in you. Let's look at each of those principles in a little more detail.
Staying on the right side of the law
Storing and processing personal data – any data that can be connected to a living person – brings with it certain legal obligations. They differ depending on the jurisdiction you operate in – Europe’s GDPR is considered one of the most stringent and privacy-friendly, but many others exist or are planned, including the California Consumer Privacy Act, the UK Data Protection Act, and China's Personal Information Protection Law. All of these regulations set down strict rules that must be followed concerning the way that personal data is collected, stored, and processed. A fundamental principle in all of these regulations is consent and the requirement for people to be made aware of your use of their data. GDPR requires, broadly speaking, that any company processing the data of citizens of the European Union must appoint a data protection officer with responsibility for ensuring compliance. I believe this is a good idea for any business – whether or not they are required to do so by legislation – and their remit should go beyond legality to cover the other principles outlined here.
It’s not unusual for companies to make expensive mistakes in this area. Mattel discontinued its Hello Barbie line of voice assistant-equipped dolls after concerns emerged that it constituted the collection of personal information from children, who were unable to legally consent due to their age.
Technology like AI and analytics can be used for great good – many use cases involve finding new cures for diseases, reducing waste in a way that helps the environment, or reducing the million-plus of road deaths that are caused by human error every year. It can also be used for ends that are not of benefit to society or the rights of individuals, such as surveillance, hacking, and cybercrime. For example, computer vision can be used to detect cancerous cells in medical images, allowing for earlier detection and better rates of survival. It could also be used by authoritarian regimes to monitor their citizens in ways that are oppressive and invasive.
When it comes to the environment, machine learning can reduce emissions by working out optimal operational parameters for machinery, such as thermostats and data center cooling systems. On the other hand, this doesn’t come for free – training a large machine learning language algorithm can result in carbon emissions equivalent to 125 round-trips between New York and Beijing, according to researchers at the University of Massachusetts Amherst.
Consideration also needs to be given to the impact that technology will have on human lives. It’s currently hotly debated whether or not AI will lead to a net loss or gain of human jobs, with the World Economic Forum predicting that 85 million human roles could be redundant due to technology by 2025. However, it also says up to 97 million new jobs could be created, either due to a need for humans to create the technology, work alongside it, or just do the things that machines still can’t do.
Quality and Trust
Getting legal compliance or ethical responsibilities wrong can quickly damage the essential trust that any business needs from society and its customers in order to succeed. But there’s another element of trust that has to be taken into consideration when we’re considering data governance, and that’s the trustworthiness of the data itself.
Amazon caused controversy recently when it emerged that AI routines were being used to identify and dismiss under-performing warehouse workers. The ethics and perhaps the legality of this can certainly be debated, but ultimately it comes down to a trust issue – can we trust that the machines will make the right decision? Most people would accept that a company has the right to replace underperforming staff, but can we trust that a machine is doing it in a fair and unbiased way?
Machine learning is only ever as good as the data it’s trained on. Issues such as inaccurate data, biased data, inconsistently formatted data, and redundant data can sink data initiatives very quickly – such as the Microsoft Chatbot project that was quickly sidelined after it started making racist comments based on biased data in had picked up. For this reason, data quality is an essential pillar of any data governance strategy. It should cover putting in place checks to ensure the validity of all data that’s stored and processed, particularly if it’s data that’s used as the basis for decision-making that will affect people’s lives, or the business’s bottom line.