科技回声

10 条评论

manca大约 1 年前

This is exactly what <a href="https://www.perplexity.ai/" rel="nofollow">https://www.perplexity.ai/</a> is trying to do. Maybe not "RAGing" the entire internet, but sure using the mapping between natural language query to their own (probably) vector database which contains "source of truth" from the internet.The way how they build that database and what models they use for text tokenization, embeddings generation and ranking at "internet" scale is the secret sauce that enabled them to raise more than $165M to date.For sure this is where the internet search will be in a couple of years and that's why Google got really concerned when original ChatGPT was released. That said, don't assume Google is not already working on something similar. In fact, the main theme of their Google Next conference was about LLMs and RAG.

aleksiy123大约 1 年前

Is connecting a search engine to an LLM not technically a RAG for the whole web?

评论 #40214318 未加载

评论 #40214242 未加载

mehulashah大约 1 年前

Cool idea. This is a decentralized RAG approach and useful for individual site, e.g. those from Wordpress. How do you find the site that you want to "RAG" on, though? Individual domains can be vast, e.g. Google itself.

troupo大约 1 年前

Well, there's nothing new under the sun. The whatever cooperation model you may have come up with, it has been invented again, and again, and again.Before you invent a new protocol, look at Semantic Web (RDF et al), and Google Microformats, and...

rthnbgrredf大约 1 年前

I think we need a search engine that has an API. Doesn't Kagi has an API?

评论 #40221302 未加载

评论 #40214077 未加载

评论 #40214175 未加载

评论 #40214309 未加载

simonw大约 1 年前

FIYDRI^: The core idea discussed in this post is less about RAG and more about sharing web content in packages that are easier for crawlers to access - including an experiment that uses downloadable SQLite databases for that.^ For If You Didn't Read It

transitivebs大约 1 年前

this is exa's mission: <a href="https://exa.ai">https://exa.ai</a>

leblancfg大约 1 年前

I've been using Kagi's "Quick answer" more and more these days, which I guess is a form of "index the whole web" RAG.Here's their blog article for it: <a href="https://help.kagi.com/kagi/ai/quick-answer.html" rel="nofollow">https://help.kagi.com/kagi/ai/quick-answer.html</a> You have to fire up your bullshit detector when looking at the results, but I find it saves a good 3/4 clicks on average.

bagels大约 1 年前

"RAG, or Retrieval-Augmented Generation, is a method where a language model such as ChatGPT first searches for useful information in a large database and then uses this information to improve its responses."

mooktakim大约 1 年前

Aren't the LLM's already trained on the whole web? no need to RAG, in theory.

评论 #40213918 未加载

评论 #40213924 未加载

评论 #40213968 未加载

10 条评论

manca大约 1 年前

aleksiy123大约 1 年前

Is connecting a search engine to an LLM not technically a RAG for the whole web?

评论 #40214318 未加载

评论 #40214242 未加载

mehulashah大约 1 年前

troupo大约 1 年前

rthnbgrredf大约 1 年前

I think we need a search engine that has an API. Doesn't Kagi has an API?

评论 #40221302 未加载

评论 #40214077 未加载

评论 #40214175 未加载

评论 #40214309 未加载

simonw大约 1 年前

transitivebs大约 1 年前

this is exa's mission: <a href="https://exa.ai">https://exa.ai</a>

leblancfg大约 1 年前

bagels大约 1 年前

mooktakim大约 1 年前

Aren't the LLM's already trained on the whole web? no need to RAG, in theory.

评论 #40213918 未加载

评论 #40213924 未加载

评论 #40213968 未加载

Can we RAG the whole web?

10 条评论

Can we RAG the whole web?

10 条评论