Hey hn! Max and Matt here from Talc AI (YC S23). We help teams create data that’s traditionally hard to find - think things you’d normally need a doctor, lawyer, accountant, or engineer to write.<p>We’ve been struggling to demo our synthetic data product (it’s complicated to set up), so we stripped our product down to its core - an "ontologizer" that takes plain text descriptions and generates varied, detailed synthetic data. For this demo, we're focused on medical data like radiology reports and SOAP notes.<p>Try it here: <a href="https://demo.talcapi.com/demo/meddoc" rel="nofollow">https://demo.talcapi.com/demo/meddoc</a><p>Example use case: Instead of dealing with HIPAA compliance or hiring doctors to write fake data, just type "medical notes with billing codes" to get test data instantly.<p>One key limitation: unlike our real product, this isn’t grounded in reality and won’t match the distribution of real data.<p>For specialized use cases (rare diseases, financial regulations, etc.), we can inject domain expertise into the process. Our customers use these "golden datasets" to test clinical trial matching, train financial and engineering Q&A models, and benchmark LLMs.<p>To generate this data we run an unsupervised process to identify the relevant metadata and structure then use this information to seed a generation process, inspired by papers like Google's CodecLM.<p>We'd love feedback! Our last HN launch helped us catch several bugs.