This hits home. I am helping someone analyze medical research data. When I helped before a few years ago we spent a few weeks trying to clean the data, figure out how to run the basic analysis (linear regression, etc), only to arrive at "some" results that were never repeatable because we learned as we built.<p>I am doing it again now. I used Claude to import the data from CSV into a database, then asked it to help me normalize it, which output a txt file with a lot of interesting facts about the data. Next step I asked to write a "fix data" script that will fix all the issues I told it about.<p>Finally, I said "give me univariate analysis, output the results into CSV / PNG and then write a separate script to display everything in a jupyter notebook".<p>Weeks of work into about 2 hours...
Most of these examples/walkthroughs look like they have been generated by LLMs. They might be useful for teaching purposes, but I believe they significantly underestimate the amount of domain knowledge that often goes into data extraction, data cleaning and data wrangling.
Afaict this skips the evals and alignment side of LLMs. We find result quality is where most of our AI time goes when helping teams across industries and building our own LLM tools. Calling an API and getting bad results is the easy part, while ensuring good results is the part we get paid for.<p>If you look at tools like dspy, even if you disagree with their solutions, much of their effort is on helping get good vs bad results. In practice, I find different LLM use cases to have different correctness approaches, but it's not that many. I'd encourage anyone trying to teach here to always include how to get good results for every method presented, otherwise it is teaching bad & incomplete methods.
This is where things are headed. All that ridiculous busywork that goes into ETL and modeling pipelines… it’s going to turn into “here’s a pile of data that’s useful for answering a question, here’s a prompt that describes how to structure it and what question I want answered, and here’s my oauth token to get it done.” So much data cleaning and prep code will be scrapped over the next few years…
What's missing in these examples are evals and any advice on creating a verification set that can be used to assert that the system continues to work as designed. Models and their prompting patterns change; one cannot just pretend that a 1-time automation will continue to work indefinitely when the environment is constantly changing.
This ETL is nice, but ours is 100k LOC, and spans multiple departments and employments, and I haven’t yet been able to make an LLM write a convincing test that wasn’t already solved by strict typing.<p>I’m not trying to move the goal post here, but LLMs haven’t replaced a single headcount. In fact, it’s only been helping our business so far.
For those interested, you can use LLMs to process CSVs in Hal9 and also generate streamlit apps, in addition, the code is open source so if you want to help us improve our RAG or add new tools, you are more than welcomed.<p>- <a href="https://hal9.ai" rel="nofollow">https://hal9.ai</a><p>- <a href="https://github.com/hal9ai/hal9">https://github.com/hal9ai/hal9</a>