Hey HN,<p>I would like to extract structured information from captured audio on a device that is not too expensive (a small LLM would be an option, I got an old NVidia 1660 Super with 6GB VRAM).<p>OpenAI Whisper could be used to get the audio contents as text, but I don't really know how I could reliably extract the information in a structured way. There is always a "purpose", which is selected out of let's say 10 possible purposes and "required data", which is depending on the purpose and composed by key value pairs, that also have predefined values.<p>An example (spoken text):<p><pre><code> Please apply for leave from 1st November to 8th november.
</code></pre>
Result (structured data):<p><pre><code> {
purpose: "apply for leave",
data: {
start: "2025-11-01",
end: "2025-11-08"
}
}
</code></pre>
What are my options to do this in a reliable way that can match different purposes with different data by "best match" approach?
Related OpenAI forum topic(s) that covers related issues[0].<p>Old school, mark 'paragraph'/sentence, regular expression out miscellaneous info (using language linguistics / linguistic 'typing' aka noun, verb, etc) , then dump relevent remaining info in json/delimited format & normalize data (aka 1st november to 11/01). multi-pass awk script(s) / pearl / icon are languages with appropriate in-language support. use regular expressions/statistics to detect 'outliers'/mark data for human review.<p>multi-pass awk would require a codex/phrases related to a delimited/json tag. so first pass, identify phrases (perhaps also spell correct), categorize phrases related to given delimited field (via human intervention), then rescan, check for 'outliers'/conflicting normalizations & have script do corrects per human annotations.<p>Note: Normalized phonetic annotations bit easer to handle than common dictionary spelling.<p>[0] : <a href="https://community.openai.com/t/summarizing-and-extracting-structured-data-from-long-text/453078/10" rel="nofollow">https://community.openai.com/t/summarizing-and-extracting-st...</a>