TE
TechEcho
Home24h TopNewestBestAskShowJobs
GitHubTwitter
Home

TechEcho

A tech news platform built with Next.js, providing global tech news and discussions.

GitHubTwitter

Home

HomeNewestBestAskShowJobs

Resources

HackerNews APIOriginal HackerNewsNext.js

© 2025 TechEcho. All rights reserved.

Can we RAG the whole web?

21 pointsby jeanloolzabout 1 year ago

10 comments

mancaabout 1 year ago
This is exactly what <a href="https:&#x2F;&#x2F;www.perplexity.ai&#x2F;" rel="nofollow">https:&#x2F;&#x2F;www.perplexity.ai&#x2F;</a> is trying to do. Maybe not &quot;RAGing&quot; the entire internet, but sure using the mapping between natural language query to their own (probably) vector database which contains &quot;source of truth&quot; from the internet.<p>The way how they build that database and what models they use for text tokenization, embeddings generation and ranking at &quot;internet&quot; scale is the secret sauce that enabled them to raise more than $165M to date.<p>For sure this is where the internet search will be in a couple of years and that&#x27;s why Google got really concerned when original ChatGPT was released. That said, don&#x27;t assume Google is not already working on something similar. In fact, the main theme of their Google Next conference was about LLMs and RAG.
aleksiy123about 1 year ago
Is connecting a search engine to an LLM not technically a RAG for the whole web?
评论 #40214318 未加载
评论 #40214242 未加载
mehulashahabout 1 year ago
Cool idea. This is a decentralized RAG approach and useful for individual site, e.g. those from Wordpress. How do you find the site that you want to &quot;RAG&quot; on, though? Individual domains can be vast, e.g. Google itself.
troupoabout 1 year ago
Well, there&#x27;s nothing new under the sun. The whatever cooperation model you may have come up with, it has been invented again, and again, and again.<p>Before you invent a new protocol, look at Semantic Web (RDF et al), and Google Microformats, and...
rthnbgrredfabout 1 year ago
I think we need a search engine that has an API. Doesn&#x27;t Kagi has an API?
评论 #40221302 未加载
评论 #40214077 未加载
评论 #40214175 未加载
评论 #40214309 未加载
simonwabout 1 year ago
FIYDRI^: The core idea discussed in this post is less about RAG and more about sharing web content in packages that are easier for crawlers to access - including an experiment that uses downloadable SQLite databases for that.<p>^ For If You Didn&#x27;t Read It
transitivebsabout 1 year ago
this is exa&#x27;s mission: <a href="https:&#x2F;&#x2F;exa.ai">https:&#x2F;&#x2F;exa.ai</a>
leblancfgabout 1 year ago
I&#x27;ve been using Kagi&#x27;s &quot;Quick answer&quot; more and more these days, which I guess is a form of &quot;index the whole web&quot; RAG.<p>Here&#x27;s their blog article for it: <a href="https:&#x2F;&#x2F;help.kagi.com&#x2F;kagi&#x2F;ai&#x2F;quick-answer.html" rel="nofollow">https:&#x2F;&#x2F;help.kagi.com&#x2F;kagi&#x2F;ai&#x2F;quick-answer.html</a> You have to fire up your bullshit detector when looking at the results, but I find it saves a good 3&#x2F;4 clicks on average.
bagelsabout 1 year ago
&quot;RAG, or Retrieval-Augmented Generation, is a method where a language model such as ChatGPT first searches for useful information in a large database and then uses this information to improve its responses.&quot;
mooktakimabout 1 year ago
Aren&#x27;t the LLM&#x27;s already trained on the whole web? no need to RAG, in theory.
评论 #40213918 未加载
评论 #40213924 未加载
评论 #40213968 未加载