The title and the contents don’t match.<p>The author expected to use LLMs to just solve the mock data problem, including traversing the schema and generating the correct Rust code for DB insertions.<p>This demonstrates little about using LLMs for _mock data_ and more about using LLMs for understanding existing system architecture.<p>The latter is a hard problem, as humans are known to create messy and complex systems (see: any engineer joining a new company).<p>For mock data generation, we’ve[0] actually found LLMs to be fantastic, however there are a few tricks.<p>1. Few shot prompting: use a couple of example “records” by inserting user/assistant messages to “prime” the context
2. Keep the records you’ve generated in context, as in, treat every record generated as a historical chat message. This helps avoid duplicates/repeats of common tropes (e.g. John Smith)
3. Split your tables into multiple generations steps — e.g. start with “users” and then for each user generate an “address” (with history!), and so on. Model your mock data creation after your schema and its constraints, don’t rely on the LLM for this step.
4. Separate out mock data generation and DB updates into disparate steps. First generate CSVs (or JSON/YAML) of your data, and then use a separate script(s) to insert that data. This helps avoid issues at insertion as you can easily tweak, retry, or pass on malformed data.<p>LLMs are fantastic tools for mock data creation, but don’t expect them to also solve the problem of understanding your legacy DB schemas and application code all at once (yet?).<p>[0]<a href="https://www.youtube.com/watch?v=BJ1wtjdHn-E" rel="nofollow">https://www.youtube.com/watch?v=BJ1wtjdHn-E</a>