TE
科技回声
首页24小时热榜最新最佳问答展示工作
GitHubTwitter
首页

科技回声

基于 Next.js 构建的科技新闻平台,提供全球科技新闻和讨论内容。

GitHubTwitter

首页

首页最新最佳问答展示工作

资源链接

HackerNews API原版 HackerNewsNext.js

© 2025 科技回声. 版权所有。

Can we RAG the whole web?

21 点作者 jeanloolz大约 1 年前

10 条评论

manca大约 1 年前
This is exactly what <a href="https:&#x2F;&#x2F;www.perplexity.ai&#x2F;" rel="nofollow">https:&#x2F;&#x2F;www.perplexity.ai&#x2F;</a> is trying to do. Maybe not &quot;RAGing&quot; the entire internet, but sure using the mapping between natural language query to their own (probably) vector database which contains &quot;source of truth&quot; from the internet.<p>The way how they build that database and what models they use for text tokenization, embeddings generation and ranking at &quot;internet&quot; scale is the secret sauce that enabled them to raise more than $165M to date.<p>For sure this is where the internet search will be in a couple of years and that&#x27;s why Google got really concerned when original ChatGPT was released. That said, don&#x27;t assume Google is not already working on something similar. In fact, the main theme of their Google Next conference was about LLMs and RAG.
aleksiy123大约 1 年前
Is connecting a search engine to an LLM not technically a RAG for the whole web?
评论 #40214318 未加载
评论 #40214242 未加载
mehulashah大约 1 年前
Cool idea. This is a decentralized RAG approach and useful for individual site, e.g. those from Wordpress. How do you find the site that you want to &quot;RAG&quot; on, though? Individual domains can be vast, e.g. Google itself.
troupo大约 1 年前
Well, there&#x27;s nothing new under the sun. The whatever cooperation model you may have come up with, it has been invented again, and again, and again.<p>Before you invent a new protocol, look at Semantic Web (RDF et al), and Google Microformats, and...
rthnbgrredf大约 1 年前
I think we need a search engine that has an API. Doesn&#x27;t Kagi has an API?
评论 #40221302 未加载
评论 #40214077 未加载
评论 #40214175 未加载
评论 #40214309 未加载
simonw大约 1 年前
FIYDRI^: The core idea discussed in this post is less about RAG and more about sharing web content in packages that are easier for crawlers to access - including an experiment that uses downloadable SQLite databases for that.<p>^ For If You Didn&#x27;t Read It
transitivebs大约 1 年前
this is exa&#x27;s mission: <a href="https:&#x2F;&#x2F;exa.ai">https:&#x2F;&#x2F;exa.ai</a>
leblancfg大约 1 年前
I&#x27;ve been using Kagi&#x27;s &quot;Quick answer&quot; more and more these days, which I guess is a form of &quot;index the whole web&quot; RAG.<p>Here&#x27;s their blog article for it: <a href="https:&#x2F;&#x2F;help.kagi.com&#x2F;kagi&#x2F;ai&#x2F;quick-answer.html" rel="nofollow">https:&#x2F;&#x2F;help.kagi.com&#x2F;kagi&#x2F;ai&#x2F;quick-answer.html</a> You have to fire up your bullshit detector when looking at the results, but I find it saves a good 3&#x2F;4 clicks on average.
bagels大约 1 年前
&quot;RAG, or Retrieval-Augmented Generation, is a method where a language model such as ChatGPT first searches for useful information in a large database and then uses this information to improve its responses.&quot;
mooktakim大约 1 年前
Aren&#x27;t the LLM&#x27;s already trained on the whole web? no need to RAG, in theory.
评论 #40213918 未加载
评论 #40213924 未加载
评论 #40213968 未加载