When I worked as a database consultant, I often saw conversations similar to the following at clients:<p>Manager: Hey Data Analyst! We have a huge amount of data in our data lake, and I would like you to use it to provide us with some analysis and visualizations. Could you get that to me next week?<p>Data Analyst: Not really. The data that we have in the data lake is unstructured, and I can't do much with it until it's structured.<p>Manager: What do you need in order to structure it?<p>Data Analyst: Our data engineers have the skills to do that.<p>Manager: Oh, I see – OK, I’ll talk to the Data Engineer. Hey, Data Engineer! Can you structure this data by next week?<p>Data Engineer: Next week sounds a bit unrealistic. I’ll need to talk to the teams that have created the data in order to be sure that I understand how it is organized and what it means. After that, I can write some custom code or use a pre-existing tool to extract information from the unstructured data, and then store it in a structured table. Let’s set up a project and allocate time and resources to do this, it could take some time!<p>Manager: You mean that we have all this data, but we can’t use it unless we spend a bunch of engineering resources to process it first?<p>Data Engineer: That is correct.<p>Manager: My boss is not going to be happy … Can you help me to get a better understanding of the difference between structured and unstructured data? I’m going to need some good justification for this!<p>Data Engineer: Sure, I’ve written a summary of the difference between structured, semi-structured, and unstructured data for you. Keep reading at https://airbyte.com/blog/analyze-unstructured-data!
Here is a related tweet that touches on exactly this issue: <a href="https://twitter.com/sethrosen/status/1252291581320757249?lang=en" rel="nofollow">https://twitter.com/sethrosen/status/1252291581320757249?lan...</a>
For a more detailed explanation of unstructured vs. structured data, see: <a href="https://airbyte.com/blog/analyze-unstructured-data" rel="nofollow">https://airbyte.com/blog/analyze-unstructured-data</a>
haha nice little skit, i wonder if this could work as a youtube short as well. the famous SethRosen tweet in TFA is basically the start of a long conversation about what we want from data vs the work that it takes to get it