Ask HN: How to extract structured content from unstructured menus using NLP and ML?

27 点作者 restapi超过 7 年前

4 条评论

3131s超过 7 年前

This might not be a task for ML, especially assuming the only option would be unsupervised ML.<p>I would suggest using an ontology, or rolling your own from the English Wikipedia database dump, as a basis for tokenization of the menu text and go from there. What structured content exactly are you trying to extract?

BjoernKW超过 7 年前

What's the source format? If you're dealing with PDFs at least you have textual data, which could be matched against a recipe database. I haven't checked but services like Epicurious might offer an API for that.<p>In that case you wouldn't need ML at all but pattern matching combined with named entity recognition probably would do just fine.

misiti3780超过 7 年前

can you provide an example link to the data.

评论 #15332673 未加载

ocrcustomserver超过 7 年前

If you deal with pdf documents, I might be able to help. Mail is in profile.