What’s The Difference Between Structured, Semi-Structured And Unstructured Data?
2 July 2021
When a conversation turns to analytics or big data, the terms structured, semi-structured and unstructured might get bandied about. These are classifications of data that are now important to understand with the rapid increase of semi-structured and unstructured data today as well as the development of tools that make managing and analysing these classes of data possible. Here’s what you need to know.
Structured Data
Data that is the easiest to search and organise, because it is usually contained in rows and columns and its elements can be mapped into fixed pre-defined fields, is known as structured data. Think about what data you might store in an Excel spreadsheet and you have an example of structured data. Structured data can follow a data model a database designer creates – think of sales records by region, by product or by customer. In structured data, entities can be grouped together to form relations (‘customers’ that are also ‘satisfied with the service). This makes structured data easy to store, analyse and search and until recently was the only data easily usable for businesses. Today, most estimate structured data accounts for less than 20 percent of all data.
Often structured data is managed using Structured Query Language (SQL)—a programming software language developed by IBM in the 1970s for relational databases.
Structured data can be created by machines and humans. Examples of structured data include financial data such as accounting transactions, address details, demographic information, star ratings by customers, machines logs, location data from smart phones and smart devices, etc.
Unstructured Data
A much bigger percentage of all the data is our world is unstructured data. Unstructured data is data that cannot be contained in a row-column database and doesn’t have an associated data model. Think of the text of an email message. The lack of structure made unstructured data more difficult to search, manage and analyse, which is why companies have widely discarded unstructured data, until the recent proliferation of artificial intelligence and machine learning algorithms made it easier to process.
Other examples of unstructured data include photos, video and audio files, text files, social media content, satellite imagery, presentations, PDFs, open-ended survey responses, websites and call centre transcripts/recordings.
Instead of spreadsheets or relational databases, unstructured data is usually stored in data lakes, NoSQL databases, applications and data warehouses. The wealth of information in unstructured data is now accessible and can be automatically processed with artificial intelligence algorithms today. This technology has elevated unstructured data to an extremely valuable resource for organisations.
Semi-Structured Data
Beyond structured and unstructured data, there is a third category, which basically is a mix between both of them. The type of data defined as semi-structured data has some defining or consistent characteristics but doesn’t conform to a structure as rigid as is expected with a relational database. Therefore, there are some organisational properties such as semantic tags or metadata to make it easier to organise, but there’s still fluidity in the data. mail messages are a good example. While the actual content is unstructured, it does contain structured data such as name and email address of sender and recipient, time sent, etc. Another example is a digital photograph. The image itself is unstructured, but if the photo was taken on a smart phone, for example, it would be date and time stamped, geo tagged, and would have a device ID. Once stored, the photo could also be given tags that would provide a structure, such as ‘dog’ or ‘pet.’
A lot of what people would usually classify as unstructured data is indeed semi-structured, because it contains some classifying characteristics.
The Difference Between Structured, Unstructured, And Semi-Structured Data
To easily understand the differences between the classifications of data, let’s use this analogy to illustrate. When interviewing for a job, let’s say there are three different classifications of interviews: structured, semi-structured and unstructured.
In a structured interview, the interviewer follows a strict script that was defined by the human resources department and is followed for every candidate. Another form of interview is an unstructured interview. In an unstructured interview, it is entirely up to the interviewer to determine the questions and the order they will be asked (or even if they will be asked) for every candidate. A semi-structured interview takes elements from both structured and unstructured interview classifications. It uses the consistency and quantitative elements allowed with the structured interview but offers the freedom to customise based on the circumstances that are more in line with an unstructured interview.
So, for data, structured data is easily organizable and follows a rigid format; unstructured is complex and often qualitative information that is impossible to reduce to or organise in a relational database and semi-structured data has elements of both.
Related Articles
AI Gone Wild: How Grok-2 Is Pushing The Boundaries Of Ethics And Innovation
As AI continues to evolve at breakneck speed, Elon Musk's latest creation, Grok-2, is making waves in the tech world.[...]
Apple’s New AI Revolution: Why ‘Apple Intelligence’ Could Change Everything
Apple's announcement of 'Apple Intelligence' marks a seismic shift in how we interact with our devices.[...]
Why AI Models Are Collapsing And What It Means For The Future Of Technology
Artificial intelligence has revolutionized everything from customer service to content creation, giving us tools like ChatGPT and Google Gemini, which can generate human-like text or images with remarkable accuracy.[...]
Where Will Artificial Intelligence Take Us In The Future?
Just a few years back, if you had been told that by 2024, you would be able to have a conversation with a computer that would seem almost completely human, would you have believed it?[...]
AI: Overhyped Fantasy Or Truly The Next Industrial Revolution?
The term “fourth industrial revolution” has been used in recent years to describe the transformative impact that many believe AI and automation will have on human society.[...]
The World On Edge: 5 Global Mega Threats That Could Reshape Our Future
In an era of unprecedented global interconnectedness, humanity faces a perfect storm of challenges that threaten to reshape our world.[...]
Sign up to Stay in Touch!
Bernard Marr is a world-renowned futurist, influencer and thought leader in the fields of business and technology, with a passion for using technology for the good of humanity.
He is a best-selling author of over 20 books, writes a regular column for Forbes and advises and coaches many of the world’s best-known organisations.
He has a combined following of 4 million people across his social media channels and newsletters and was ranked by LinkedIn as one of the top 5 business influencers in the world.
Bernard’s latest book is ‘Generative AI in Practice’.
Social Media