What’s The Difference Between Structured, Semi-Structured And Unstructured Data?
2 July 2021
When a conversation turns to analytics or big data, the terms structured, semi-structured and unstructured might get bandied about. These are classifications of data that are now important to understand with the rapid increase of semi-structured and unstructured data today as well as the development of tools that make managing and analysing these classes of data possible. Here’s what you need to know.
Structured Data
Data that is the easiest to search and organise, because it is usually contained in rows and columns and its elements can be mapped into fixed pre-defined fields, is known as structured data. Think about what data you might store in an Excel spreadsheet and you have an example of structured data. Structured data can follow a data model a database designer creates – think of sales records by region, by product or by customer. In structured data, entities can be grouped together to form relations (‘customers’ that are also ‘satisfied with the service). This makes structured data easy to store, analyse and search and until recently was the only data easily usable for businesses. Today, most estimate structured data accounts for less than 20 percent of all data.
Often structured data is managed using Structured Query Language (SQL)—a programming software language developed by IBM in the 1970s for relational databases.
Structured data can be created by machines and humans. Examples of structured data include financial data such as accounting transactions, address details, demographic information, star ratings by customers, machines logs, location data from smart phones and smart devices, etc.
Unstructured Data
A much bigger percentage of all the data is our world is unstructured data. Unstructured data is data that cannot be contained in a row-column database and doesn’t have an associated data model. Think of the text of an email message. The lack of structure made unstructured data more difficult to search, manage and analyse, which is why companies have widely discarded unstructured data, until the recent proliferation of artificial intelligence and machine learning algorithms made it easier to process.
Other examples of unstructured data include photos, video and audio files, text files, social media content, satellite imagery, presentations, PDFs, open-ended survey responses, websites and call centre transcripts/recordings.
Instead of spreadsheets or relational databases, unstructured data is usually stored in data lakes, NoSQL databases, applications and data warehouses. The wealth of information in unstructured data is now accessible and can be automatically processed with artificial intelligence algorithms today. This technology has elevated unstructured data to an extremely valuable resource for organisations.
Semi-Structured Data
Beyond structured and unstructured data, there is a third category, which basically is a mix between both of them. The type of data defined as semi-structured data has some defining or consistent characteristics but doesn’t conform to a structure as rigid as is expected with a relational database. Therefore, there are some organisational properties such as semantic tags or metadata to make it easier to organise, but there’s still fluidity in the data. mail messages are a good example. While the actual content is unstructured, it does contain structured data such as name and email address of sender and recipient, time sent, etc. Another example is a digital photograph. The image itself is unstructured, but if the photo was taken on a smart phone, for example, it would be date and time stamped, geo tagged, and would have a device ID. Once stored, the photo could also be given tags that would provide a structure, such as ‘dog’ or ‘pet.’
A lot of what people would usually classify as unstructured data is indeed semi-structured, because it contains some classifying characteristics.
The Difference Between Structured, Unstructured, And Semi-Structured Data
To easily understand the differences between the classifications of data, let’s use this analogy to illustrate. When interviewing for a job, let’s say there are three different classifications of interviews: structured, semi-structured and unstructured.
In a structured interview, the interviewer follows a strict script that was defined by the human resources department and is followed for every candidate. Another form of interview is an unstructured interview. In an unstructured interview, it is entirely up to the interviewer to determine the questions and the order they will be asked (or even if they will be asked) for every candidate. A semi-structured interview takes elements from both structured and unstructured interview classifications. It uses the consistency and quantitative elements allowed with the structured interview but offers the freedom to customise based on the circumstances that are more in line with an unstructured interview.
So, for data, structured data is easily organizable and follows a rigid format; unstructured is complex and often qualitative information that is impossible to reduce to or organise in a relational database and semi-structured data has elements of both.
Related Articles
Will AI Solve The World’s Inequality Problem – Or Make It Worse?
We are standing on the cusp of a new technological revolution. AI is increasingly permeating every aspect of our lives, with intelligent machines transforming the way we live and work.[...]
How You Become Irreplaceable In The Age Of AI
In a world where artificial intelligence is rapidly advancing, many of us are left wondering: Will AI take our jobs?[...]
Why Apple Intelligence Sets A New Gold Standard For AI Privacy
In the rapidly evolving world of artificial intelligence, privacy concerns have become a hot-button issue.[...]
Can Your Device Run Apple Intelligence? What You Need To Know
Apple's announcement of Apple Intelligence has sent waves of excitement through the tech world.[...]
10 Amazing Things You Can Do With Apple Intelligence On Your IPhone
Apple Intelligence is poised to revolutionize the iPhone experience, offering a suite of AI-powered tools that promise to make your digital life easier, more productive, and more creative.[...]
Agentic AI: The Next Big Breakthrough That’s Transforming Business And Technology
The world of artificial intelligence is evolving at a breakneck pace, and just when you thought you'd wrapped your head around generative AI, along comes another game-changing concept: agentic AI.[...]
Sign up to Stay in Touch!
Bernard Marr is a world-renowned futurist, influencer and thought leader in the fields of business and technology, with a passion for using technology for the good of humanity.
He is a best-selling author of over 20 books, writes a regular column for Forbes and advises and coaches many of the world’s best-known organisations.
He has a combined following of 4 million people across his social media channels and newsletters and was ranked by LinkedIn as one of the top 5 business influencers in the world.
Bernard’s latest book is ‘Generative AI in Practice’.
Social Media