Hi HN!<p>My latest side project is knowledge graph that maps the French culinary network using data extracted from restaurant reviews from LeFooding.com. The project uses LLMs to extract structured information from unstructured text.<p>Some technical aspects you may be interested in:<p>- Used structured generation to reliably parse unstructured text into a consistent schema<p>- Tested multiple models (Mistral-7B-v0.3, Llama3.2-3B, gpt4o-mini) for information extraction<p>- Created an interactive visualization using gephi-lite and Retina (WebGL)<p>- Built (with Claude) a simple Flask web app to clean and deduplicate the data<p>- Total cost for inferencing 2000 reviews with gpt4o-mini: less than 1€!<p>You can explore the visualization here: [Interactive Culinary Network](<a href="https://ouestware.gitlab.io/retina/1.0.0-beta.4/#/graph/?url=https%3A%2F%2Fgist.githubusercontent.com%2Ftheophilec%2F351f17ece36477bc48438d5ec6d14b5a%2Fraw%2Ffa85a89541c953e8f00d6774fe42f8c4bd30fa47%2Fgraph.gexf&r=x&sa=re&ca[]=t&ca[]=ra-s&st[]=u&st[]=re&ed=u" rel="nofollow">https://ouestware.gitlab.io/retina/1.0.0-beta.4/#/graph/?url...</a>)<p>The code for the project is available on GitHub:
- Main project: <a href="https://github.com/theophilec/foudinge">https://github.com/theophilec/foudinge</a>
- Data cleaning tool: <a href="https://github.com/theophilec/foudinge-scrub">https://github.com/theophilec/foudinge-scrub</a><p>Happy to get feedback!
The embedding is kind of weird. Like, there's no reason a "degree: 1" node should be so far away from its sibling.<p>Example: <a href="https://imgur.com/a/7Cktyzp" rel="nofollow">https://imgur.com/a/7Cktyzp</a><p>This makes the graph look more random/noisy/disorganized than it actually is.
This is a super cool idea! I've sort of mused about an idea for general web search that's very similar to this concept, where you start with a set of trusted entities and then branch out from there, but choosing how you establish trust is really important. But this is a really clever application, well done!
Very cool work.<p>It's worth mentionion that the Graph browser using "Retina" is a project from Ouestware (<a href="https://www.ouestware.com/en/" rel="nofollow">https://www.ouestware.com/en/</a>) which is also contributor to the GraphCommons and GephiLite projects.
Given the structured nature of the data, how does this compare to running a specialized classification model that looks for specific words in a review and uses those to assign Chefs to Restaurants? With some fine tuning, you might get more consistent results than feeding the reviews into a generative model.
This was inspiring, what a cool idea. Just curious—-for 4o mini isn’t there a json mode that reliably produces structured output? Was that what you were referring to / ended up using?
Great project. I propose an improvement over this conventional kind of object-style graph. Instead, every single item should be a node or an edge. The objects are needless complexities that obscure pure graph relations. Like this: <a href="https://memelang.net/03/" rel="nofollow">https://memelang.net/03/</a>
Very interesting. A small tweak and it seems like this could be applied to the problem of identifying degree of separation from political dissidents or other targets with the right data source. Lots of tools already exist that do that, but it's kind of wild how accessible and scalable certain techniques have become.
Do you think this will work as effectively with Google or Social Media review and rating datasets? As every country may not have a LeFooding.com<p>Would like to here everyone's thoughts