TE
科技回声
首页24小时热榜最新最佳问答展示工作
GitHubTwitter
首页

科技回声

基于 Next.js 构建的科技新闻平台,提供全球科技新闻和讨论内容。

GitHubTwitter

首页

首页最新最佳问答展示工作

资源链接

HackerNews API原版 HackerNewsNext.js

© 2025 科技回声. 版权所有。

Ask HN: How to extract structured content from unstructured menus using NLP and ML?

27 点作者 restapi超过 7 年前

4 条评论

3131s超过 7 年前
This might not be a task for ML, especially assuming the only option would be unsupervised ML.<p>I would suggest using an ontology, or rolling your own from the English Wikipedia database dump, as a basis for tokenization of the menu text and go from there. What structured content exactly are you trying to extract?
BjoernKW超过 7 年前
What&#x27;s the source format? If you&#x27;re dealing with PDFs at least you have textual data, which could be matched against a recipe database. I haven&#x27;t checked but services like Epicurious might offer an API for that.<p>In that case you wouldn&#x27;t need ML at all but pattern matching combined with named entity recognition probably would do just fine.
misiti3780超过 7 年前
can you provide an example link to the data.
评论 #15332673 未加载
ocrcustomserver超过 7 年前
If you deal with pdf documents, I might be able to help. Mail is in profile.