A blog on using using LLMs to clean, process, and enrich data. It includes prompts and code snippets. The post draws on my experiences and two really interesting papers:<p>- Can Foundation Models Wrangle Your Data? (<a href="https://arxiv.org/abs/2205.09911" rel="nofollow noreferrer">https://arxiv.org/abs/2205.09911</a>)<p>- Large Language Models as Data Preprocessors (<a href="https://arxiv.org/abs/2308.16361" rel="nofollow noreferrer">https://arxiv.org/abs/2308.16361</a>)<p>I cover:<p>- Error and Anomaly Detection<p>- Enriching Data with LLMs<p>- Matching Data Labels<p>- Identifying Matching Records<p>Thank you and I'd appreciate your feedback.