TE
科技回声
首页24小时热榜最新最佳问答展示工作
GitHubTwitter
首页

科技回声

基于 Next.js 构建的科技新闻平台,提供全球科技新闻和讨论内容。

GitHubTwitter

首页

首页最新最佳问答展示工作

资源链接

HackerNews API原版 HackerNewsNext.js

© 2025 科技回声. 版权所有。

Will keyword (BM25, TD-IDF) be replaced for search by Neural Search?

18 点作者 jamesblonde超过 2 年前

5 条评论

awadallah超过 2 年前
We, humans, are preconditioned to be linear in our extrapolation (as opposed to exponential) thanks to our hunter ancestors (and FPS games!). It is very clear that the rate of advancement of Large Language Models is super-linear, if not exponential.<p>Hence, I indeed predict that keyword search will be completely supplanted in the next 5 years as a mechanism for search.<p>Of course we will still need to do lookups for ISBNs and generic ids, but that isn&#x27;t keyword search, that is index lookup functionality.<p>Case in point: take a look at Meta Research&#x27;s Contriever model (<a href="https:&#x2F;&#x2F;github.com&#x2F;facebookresearch&#x2F;contriever" rel="nofollow">https:&#x2F;&#x2F;github.com&#x2F;facebookresearch&#x2F;contriever</a>), which already matches keyword techniques in efficacy without any supervision.<p>This is only the beginning, come build the future with us, we see it very clearly :)
jamesblonde超过 2 年前
Amr Awadallah, cofounder of Cloudera, is arguing that Keyword Search will be replaced with Neural Search within the next five years. He’s not alone; a number of companies have emerged in this space recently (e.g. Deepset Haystack, Hebbia, etc.). Amr launched his new startup Vectara today. They claim their Neural Search as a service is as easy to use as Algolia, as scalable as Elastic, but &quot;neural-first&quot; leading to much higher semantic relevance (they are free for 15k queries per month). My question: do you really see neural search replacing keyword search, or do you simply see it as an extension?
评论 #33202009 未加载
bedouin-ranger超过 2 年前
There is a place for both:<p>If you are searching for a specific ID&#x2F;ISBN some random token, keyword search will be always useful and easy to implement.<p>If the goal of the search is more semantically ambiguous and can not be expressed by a unique phrase, then neural search will be the way to go.<p>Most of the interesting applications of search will be semantically driven and therefore neural search has a big role to play.
评论 #33184202 未加载
paloaltobound超过 2 年前
Interesting! I wonder if it can help improve search for images&#x2F;products&#x2F;code and other artifacts.
评论 #33187895 未加载
charliejuggler超过 2 年前
People have been predicting the death of keyword search for at least as long as I&#x27;ve been working in the field (23 years and counting!). We&#x27;ve seen outrageous marketing claims, buzzwords like &quot;concept search&quot;, &quot;insight engines&quot;, many new companies promising a step change in search quality, some of which are still in business but many who shone briefly then vanished. The concept of an easy to use, fire-and-forget, scalable search engine that gives you great relevance out of the box isn&#x27;t new. Yet the bag-of-words model remains the standard across the sector, it&#x27;s well understood with many powerful and scalable open &amp; closed source implementation options.<p>To make search work in practice however is hard. It&#x27;s as much about process and people as it is about technology: many companies aren&#x27;t even measuring search quality, recording search issues correctly or have an active search team (bigger than one poor overworked search person). No matter how clever the tech, these problems aren&#x27;t going away: they&#x27;re compounded by bad source data quality, misunderstandings of user search intent and bad search UX. Martin White, author of many books on search, describes search as a &#x27;wicked problem&#x27;. Getting all these parts working in harmony so you can truly own your search is what we do here at OSC and it takes time, investment and commitment.<p>I think Vectara is very interesting and the people involved have impressive track records (there&#x27;s also some other great engines like Vespa, Pinecone, Qdrant, Weaviate...). However I think the future of search is hybrid - we&#x27;ll see keyword search still there for many use cases but enhanced by vector&#x2F;neural approaches (the most widely used search engine Lucene recently gained vector features and work is happening on how to combine these with keyword ranking). No one approach will solve everyone&#x27;s search problems, cope with special cases like part number search, or the specialised language used in some sectors, or always understanding the searcher&#x27;s intent, magically without considering the human factors above or without extra tuning&#x2F;training.<p>That said, with all these exciting new approaches, tools and companies, it&#x27;s a very interesting time in the search world!<p>Further reading&#x2F;viewing: at the Haystack EU search conference a couple of weeks ago www.haystackconf.com Dmitry Kan, host of the Vector Podcast (he featured the Vectara team a while ago) gave a great keynote describing the current state of vector search - I wasn&#x27;t going to release the video until Monday but you can get an early look here <a href="https:&#x2F;&#x2F;youtu.be&#x2F;2o8-dX__EgU" rel="nofollow">https:&#x2F;&#x2F;youtu.be&#x2F;2o8-dX__EgU</a> . You can also read the joint article we wrote for The Search Network on vector search here <a href="https:&#x2F;&#x2F;opensourceconnections.com&#x2F;wp-content&#x2F;uploads&#x2F;2022&#x2F;05&#x2F;Search-Insights-2022.pdf" rel="nofollow">https:&#x2F;&#x2F;opensourceconnections.com&#x2F;wp-content&#x2F;uploads&#x2F;2022&#x2F;05...</a> (aimed at executives and others needing to understand the field).
评论 #33211069 未加载