TechEcho

10 comments

mancaabout 1 year ago

This is exactly what <a href="https://www.perplexity.ai/" rel="nofollow">https://www.perplexity.ai/</a> is trying to do. Maybe not "RAGing" the entire internet, but sure using the mapping between natural language query to their own (probably) vector database which contains "source of truth" from the internet.The way how they build that database and what models they use for text tokenization, embeddings generation and ranking at "internet" scale is the secret sauce that enabled them to raise more than $165M to date.For sure this is where the internet search will be in a couple of years and that's why Google got really concerned when original ChatGPT was released. That said, don't assume Google is not already working on something similar. In fact, the main theme of their Google Next conference was about LLMs and RAG.

aleksiy123about 1 year ago

Is connecting a search engine to an LLM not technically a RAG for the whole web?

评论 #40214318 未加载

评论 #40214242 未加载

mehulashahabout 1 year ago

Cool idea. This is a decentralized RAG approach and useful for individual site, e.g. those from Wordpress. How do you find the site that you want to "RAG" on, though? Individual domains can be vast, e.g. Google itself.

troupoabout 1 year ago

Well, there's nothing new under the sun. The whatever cooperation model you may have come up with, it has been invented again, and again, and again.Before you invent a new protocol, look at Semantic Web (RDF et al), and Google Microformats, and...

rthnbgrredfabout 1 year ago

I think we need a search engine that has an API. Doesn't Kagi has an API?

评论 #40221302 未加载

评论 #40214077 未加载

评论 #40214175 未加载

评论 #40214309 未加载

simonwabout 1 year ago

FIYDRI^: The core idea discussed in this post is less about RAG and more about sharing web content in packages that are easier for crawlers to access - including an experiment that uses downloadable SQLite databases for that.^ For If You Didn't Read It

transitivebsabout 1 year ago

this is exa's mission: <a href="https://exa.ai">https://exa.ai</a>

leblancfgabout 1 year ago

I've been using Kagi's "Quick answer" more and more these days, which I guess is a form of "index the whole web" RAG.Here's their blog article for it: <a href="https://help.kagi.com/kagi/ai/quick-answer.html" rel="nofollow">https://help.kagi.com/kagi/ai/quick-answer.html</a> You have to fire up your bullshit detector when looking at the results, but I find it saves a good 3/4 clicks on average.

bagelsabout 1 year ago

"RAG, or Retrieval-Augmented Generation, is a method where a language model such as ChatGPT first searches for useful information in a large database and then uses this information to improve its responses."

mooktakimabout 1 year ago

Aren't the LLM's already trained on the whole web? no need to RAG, in theory.

评论 #40213918 未加载

评论 #40213924 未加载

评论 #40213968 未加载

10 comments

mancaabout 1 year ago

aleksiy123about 1 year ago

Is connecting a search engine to an LLM not technically a RAG for the whole web?

评论 #40214318 未加载

评论 #40214242 未加载

mehulashahabout 1 year ago

troupoabout 1 year ago

rthnbgrredfabout 1 year ago

I think we need a search engine that has an API. Doesn't Kagi has an API?

评论 #40221302 未加载

评论 #40214077 未加载

评论 #40214175 未加载

评论 #40214309 未加载

simonwabout 1 year ago

transitivebsabout 1 year ago

this is exa's mission: <a href="https://exa.ai">https://exa.ai</a>

leblancfgabout 1 year ago

bagelsabout 1 year ago

mooktakimabout 1 year ago

Aren't the LLM's already trained on the whole web? no need to RAG, in theory.

评论 #40213918 未加载

评论 #40213924 未加载

评论 #40213968 未加载

Can we RAG the whole web?

10 comments

Can we RAG the whole web?

10 comments