Knowledge Graphs in RAG: Hype vs. Ragas Analysis

145 pointsby rooftopzen11 months ago

12 comments

davedx11 months ago

This seems highly relevant: <a href="https://arxiv.org/abs/2406.01506" rel="nofollow">https://arxiv.org/abs/2406.01506</a>> In this paper, we study the two foundational questions in this area. First, how are categorical concepts, such as {'mammal', 'bird', 'reptile', 'fish'}, represented? Second, how are hierarchical relations between concepts encoded? For example, how is the fact that 'dog' is a kind of 'mammal' encoded? We show how to extend the linear representation hypothesis to answer these questions. We find a remarkably simple structure: simple categorical concepts are represented as simplices, hierarchically related concepts are orthogonal in a sense we make precise, and (in consequence) complex concepts are represented as polytopes constructed from direct sums of simplices, reflecting the hierarchical structure.Basically, LLM's already partially encode information as semantic graphs internally.With this it is less surprising that augmenting them with external knowledge graphs has a lower ROI.

评论 #40926543 未加载

评论 #40930447 未加载

piizei11 months ago

Looks like the test-setup confuses knowledge graphs with graph databases. The code just creates a neo4j database from a document, not a knowledge graph (basically uses neo4j as vector database). A knowledge graph would be created by a LLM as a preprocessing step (and queried similary by an LLM). This is a different approach than was tested, an approach that trades preprocessing time and domain knowledge for accuracy. Reference: <a href="https://python.langchain.com/v0.1/docs/use_cases/graph/constructing/" rel="nofollow">https://python.langchain.com/v0.1/docs/use_cases/graph/const...</a>

评论 #40924351 未加载

visarga11 months ago

The Microsoft GraphRAG paper focuses on global sensemaking through hierarchical summarization, which is a fundamental aspect of their approach. The blog post analysis, however, doesn't address this core feature at all. Another issue is the corpus size, the paper focuses on sizes on the order of 1M tokens, while the reference text used in the blog post is probably shorter. On shorter text a simple LLM call could do summarization directly.

qeternity11 months ago

I don’t believe the author read the GraphRAG paper as there is nothing in this “deep dive” that implements anything remotely close.

dmezzetti11 months ago

There is no one size fits all formula. For simple RAG, a search query (vector, keyword, SQL, etc) works to build a context.For more complex questions or research, a knowledge graph can be beneficial. I wrote an article[1] earlier this year that used graph path traversal to build a context.The goal was to build a short narrative about English history from 500 - 1000 using Wikipedia articles. Vector similarity alone won't bring back good results. This article used a cypher graph path query that jumped multiple hops through concepts of interest. Those articles on that path were then brought in as the context.[1] <a href="https://neuml.hashnode.dev/advanced-rag-with-graph-path-traversal" rel="nofollow">https://neuml.hashnode.dev/advanced-rag-with-graph-path-trav...</a>

Tostino11 months ago

I really need to dig into the more recent advances in knowledge graphs + LLMs. I've been out of the game for ~10 months now, and am just starting to dig back into things and get my training pipeline working (darn bitrot...)I had previously trained a llama2 13b model (<a href="https://huggingface.co/Tostino/Inkbot-13B-8k-0.2" rel="nofollow">https://huggingface.co/Tostino/Inkbot-13B-8k-0.2</a>) on a whole bunch of knowledge graph tasks (in addition to a number of other tasks).Here is an example of the training data for training it how to use knowledge graphs:easy - <a href="https://gist.github.com/Tostino/76c55bdeb1f099fb2bfab00ce144196d" rel="nofollow">https://gist.github.com/Tostino/76c55bdeb1f099fb2bfab00ce144...</a>medium - <a href="https://gist.github.com/Tostino/0460c18024697efc2ac34fe86ecd2db3" rel="nofollow">https://gist.github.com/Tostino/0460c18024697efc2ac34fe86ecd...</a>I also trained it on generating KGs from conversations, or articles you have provided. So from the LLM side, it's way more knowledgeable about the connections in the graph than GPT4 is by default.Here are a couple examples of the trained model actually generating a knowledge graph:1. <a href="https://gist.github.com/Tostino/c3541f3a01d420e771f66c62014e6a24" rel="nofollow">https://gist.github.com/Tostino/c3541f3a01d420e771f66c62014e...</a>2. <a href="https://gist.github.com/Tostino/44bbc6a6321df5df23ba5b400a01e37d" rel="nofollow">https://gist.github.com/Tostino/44bbc6a6321df5df23ba5b400a01...</a>I haven't done any work on integrating those into larger structures, combining the graphs generated from different documents, or using a graph database to augment my use case...all things I am eager to try out, and I am glad there is a bunch more to read on the topic available now.Anyways, near term plans are to train a llama3 8b, and likely a phi-3 13b version of Inkbot on an improved version of my dataset. Glad to see others as excited as was on this topic!

itkovian_11 months ago

Knowledge graphs where created to solve the problem of making natural,free flowing text machine processable. We now have a technology that completely understands natural free flowing text and can extract meaning. Why would going back to structure help when that structure can never be as rich as just text. I get it if the kb has new information, that's not what I'm saying.

评论 #40923731 未加载

评论 #40930655 未加载

评论 #40923722 未加载

评论 #40924026 未加载

jimmySixDOF11 months ago

This is a nice sandbox walkthrough of the author's objective which was to test MSFT claims in the paper -- but with all due respect the buzz of graphs is because they add whole third layer in a combined approach like Reciprocal Rank Fusion (RRF). You do a BM25 search then you do a vector based nearest neighbors search and now you can add a KG search then all combined with local and global reranking etc the expectation is this produces a better final outcome. These findings aside, it still makes sense that adding KG to a hybrid search pipeline is going to be useful.

评论 #40924033 未加载

DrStartup10 months ago

Knowledge / property graphs provide truths that can guide the retrieval. LLMs lack a truth function, ie causality. The KPG provides this as sorta a lace across the llm vector space. A KPG can either be used as a filter or a router of sorts. I expect we’ll see kpgs colocated with vector data of the llm and a tuned router layer uses it to guide retrieval and course correct the output. Kind of like MoE.

yetanotherjosh11 months ago

It seems to me that the "knowledge graph" generated in this article is incredibly naive and not comparable to the process in the MS paper, which requires multiple rounds of preprocessing the source content using LLMs to extract, summarize, find relationships at multiple levels and model them in the graph store. This just splats chunks and words into a vector graph and is barely defensible as a "knowledge graph".Please tell me I'm missing something because this is egregious. How can you expect a graph approach to improve over naive rag if you don't actually build a knowledge graph that captures high quality, higher level entity relationships?

mark_l_watson11 months ago

That is an interesting writeup, but I had trouble understanding what they meant by what for me is a new term: “faithfulness.”This is supposedly a measure of reducing hallucinations. Is it just me, or did other people here have difficulty understanding how faithfulness was evaluated?EDIT: OK, faithfulness is calculated by human evaluation, and can be automatically calculated with ROUGE and BLEU.

lmeyerov11 months ago

I'm happy to see third-party comparisons, most of the marketing here indeed just assumes KGs are better with zero proof: marketers to be wary of. Unfortunately, I suspect a few key steps need to happen for this post to fairly reflect what the Microsoft NLP researchers called their alg, vs the broader family named by neo4j. Afaict, they're talking about a different graph.* The kg index should be text documents hierarchically summarized based on an extracted named-entity-relation graph. The blog version seems to instead do (document, word), not the KG, and afaict, skips the hierarchical NER community summarization. The blog post is doing what neo4j calls a lexical graph, not the novel KG summary index of the MSR paper.* The data volume should go up. Think a corpus like 100k+ tweets or 100+ documents. You start to see challenges like redundant tweets that clog retrieval/ranking, or many pieces of the puzzle spread over disparate chunks with indirect 'multi-hop' reasoning. Something like a debate can fit into one ChatGPT call, with no RAG. It's an interesting question how summarization preprocessing can still help small documents, but a more nuanced topic (and we have Thoughts on ;-))* The tasks should reflect the challenges: multi-hop reasoning, wider summarization with fixed budget, etc. Retesting simple queries naive RAG already solves isn't the point. The paper focused on a couple types, which is also why they route to 2 diff retrieval modes. Subtle, part of the challenge in bigger data is how many resources we give the retriever & reasoner, and part of why graph rag is exciting IMO.Afaict the blogpost essentially did a lexical graph with chunk/node embeddings, reran on a small document, and at that scale, asked simple q's... So close to a naive retrieval, and unsurprisingly, got parity. It's not too much more to improve so would encourage doing a bit more. Beyond the MSR paper, I would also experiment a bit more with retrieval strategies, eg, agentic layer on top, and include simple text search mixed in with reranking. And as validation before any of that, focus specifically on the queries expected to fail naive RAG and work in graph, and make sure those work.Related: We are working on a variant of Graph RAG that solves some additional scale & quality challenges in our data (investigations: threat intel reports, real-time social & news, misinfo, ...), and may be open to an internship or contract role for the right person. One big focus area is ensuring AI quality & AI scale as our version is more GPU/AI-centric and used in serious situations by less technical users... A bit ironic given the article :) LMK if interested, see my profile. We'll need proof of capability for both engineering + AI challenges, and easier for us to teach the latter than the former.