TE
科技回声
首页24小时热榜最新最佳问答展示工作
GitHubTwitter
首页

科技回声

基于 Next.js 构建的科技新闻平台,提供全球科技新闻和讨论内容。

GitHubTwitter

首页

首页最新最佳问答展示工作

资源链接

HackerNews API原版 HackerNewsNext.js

© 2025 科技回声. 版权所有。

Anyone working on LLM tools for enhancing data quality?

5 点作者 cstanley超过 1 年前
Big problem, let&#x27;s break it down...<p>1. Data issue identification 2. Solution and implementation<p>Most issues are discovered in the data warehouse. Entity matching customer data across different systems, some business process results in duplicate, or null data. I know there are existing, non-LLM, products that do this. I&#x27;m curious to compare those with new LLM first products.<p>On solution&#x2F;implementation. Ideally you&#x27;re able to fix this in the source system, either in your SaaS tool or in the way you write production data. You can also fix this in the data warehouse, munging&#x2F;ETL&#x27;ing the data. Seems like LLMs could help to 1) identify and recommend a change in an external system, 2) submit a PR to solve this in the data warehouse.<p>Anyone know anyone working on these problems?

4 条评论

hxypqr超过 1 年前
Obviously, the cost of using LLM to solve this problem is relatively high. At most, it can only treat LLM as one of the signals instead of processing all the information. Combining LLM with tree models or graph models and adding NER technology may achieve an approximate solution to 2. Can you really trust the answers provided by LLM to be used in a production environment?
评论 #39010997 未加载
latentpot超过 1 年前
1 is already a solved problem. My employer had originally put a ML based system to find DQ issues (already working) and is looking at pocs to add LLMs in the model mix. Hearsay is that our lakehouse vendor will have their own solution to this question via a acquisition.<p>2 is interesting, possible to do via LLM but I worry about data privacy and hallucinations making data more believable but not real.
评论 #38978928 未加载
spdustin超过 1 年前
I&#x27;m using LLMs to help build dbt-flavored Great Expectations tests. Pretty low-hanging fruit, but it&#x27;s a start.
johannesboyne超过 1 年前
We (<a href="https:&#x2F;&#x2F;dqc.ai" rel="nofollow">https:&#x2F;&#x2F;dqc.ai</a>) are doing something in the space, yes. Next to a mixture of ML, and heuristic based approaches, plus link &amp; integrations into source systems. Happy to talk about it, feel free to reach out.