I'm happy to see third-party comparisons, most of the marketing here indeed just assumes KGs are better with zero proof: marketers to be wary of. Unfortunately, I suspect a few key steps need to happen for this post to fairly reflect what the Microsoft NLP researchers called their alg, vs the broader family named by neo4j. Afaict, they're talking about a different graph.<p>* The kg index should be text documents hierarchically summarized based on an extracted named-entity-relation graph. The blog version seems to instead do (document, word), not the KG, and afaict, skips the hierarchical NER community summarization. The blog post is doing what neo4j calls a lexical graph, not the novel KG summary index of the MSR paper.<p>* The data volume should go up. Think a corpus like 100k+ tweets or 100+ documents. You start to see challenges like redundant tweets that clog retrieval/ranking, or many pieces of the puzzle spread over disparate chunks with indirect 'multi-hop' reasoning. Something like a debate can fit into one ChatGPT call, with no RAG. It's an interesting question how summarization preprocessing can still help small documents, but a more nuanced topic (and we have Thoughts on ;-))<p>* The tasks should reflect the challenges: multi-hop reasoning, wider summarization with fixed budget, etc. Retesting simple queries naive RAG already solves isn't the point. The paper focused on a couple types, which is also why they route to 2 diff retrieval modes. Subtle, part of the challenge in bigger data is how many resources we give the retriever & reasoner, and part of why graph rag is exciting IMO.<p>Afaict the blogpost essentially did a lexical graph with chunk/node embeddings, reran on a small document, and at that scale, asked simple q's... So close to a naive retrieval, and unsurprisingly, got parity. It's not too much more to improve so would encourage doing a bit more. Beyond the MSR paper, I would also experiment a bit more with retrieval strategies, eg, agentic layer on top, and include simple text search mixed in with reranking. And as validation before any of that, focus specifically on the queries expected to fail naive RAG and work in graph, and make sure those work.<p>Related: We are working on a variant of Graph RAG that solves some additional scale & quality challenges in our data (investigations: threat intel reports, real-time social & news, misinfo, ...), and may be open to an internship or contract role for the right person. One big focus area is ensuring AI quality & AI scale as our version is more GPU/AI-centric and used in serious situations by less technical users... A bit ironic given the article :) LMK if interested, see my profile. We'll need proof of capability for both engineering + AI challenges, and easier for us to teach the latter than the former.