TE
科技回声
首页24小时热榜最新最佳问答展示工作
GitHubTwitter
首页

科技回声

基于 Next.js 构建的科技新闻平台,提供全球科技新闻和讨论内容。

GitHubTwitter

首页

首页最新最佳问答展示工作

资源链接

HackerNews API原版 HackerNewsNext.js

© 2025 科技回声. 版权所有。

Evaluating Search Algorithms

107 点作者 clandry94大约 4 年前

6 条评论

BillFranklin大约 4 年前
Interesting article! Shopify&#x27;s approach is cool, it&#x27;s interesting they&#x27;re using Kafka to generate datasets. I wonder if the explicit human rankings will get stale (and also be hugely outweighted by implicit judgements in the training data). The real-time feedback aspect sounds cool, I wonder if it&#x27;s just for metrics or also for re-training in real-time.<p>I worked on a Learning To Rank implementation a year or so ago. What struck me then (and now reading about Shopify&#x27;s implementation) is that the approach is often very similar across sites, but the implementation is usually rather tailored. You see the same patterns: online&#x2F;offline metrics; nDCG; click models and implicit&#x2F;explicit relevance judgements; re-ranking top-k of results, and so on.<p>Unfortunately there doesn&#x27;t seem to be a technology tying all of the components of an LtR system together. A managed service like Algolia could be an answer. I wonder if industry will eventually converge on a framework, such as an extension to Open Source Connection&#x27;s Elasticsearch Learning to Rank plugin (<a href="https:&#x2F;&#x2F;diff.wikimedia.org&#x2F;2017&#x2F;10&#x2F;17&#x2F;elasticsearch-learning-to-rank-plugin&#x2F;" rel="nofollow">https:&#x2F;&#x2F;diff.wikimedia.org&#x2F;2017&#x2F;10&#x2F;17&#x2F;elasticsearch-learning...</a>).<p>It&#x27;s a really interesting area of theory and practice - I hope Shopify write more about their implementation!<p>I&#x27;d also recommend reading Airbnb&#x27;s really excellent paper - <a href="https:&#x2F;&#x2F;arxiv.org&#x2F;pdf&#x2F;1810.09591.pdf" rel="nofollow">https:&#x2F;&#x2F;arxiv.org&#x2F;pdf&#x2F;1810.09591.pdf</a>.
评论 #26667678 未加载
NumberCruncher大约 4 年前
I should re-read the article because I can&#x27;t see what kind of problem they try to solve with MAP, NDCG and &quot;invented here&quot; Pagerank what couldn&#x27;t be solved with tf-idf and out-of-the box Elasticsearch functionality. It&#x27;s a highly underrated peace of software.
评论 #26667544 未加载
评论 #26673804 未加载
评论 #26671923 未加载
LZ_Khan大约 4 年前
Where do the relevance scores come from? Are they human rated? I feel like that could leave room for error as raters would probably not have the same opinion as me on what a good document is.
评论 #26668934 未加载
ntonozzi大约 4 年前
Great article!<p>This seems like a fairly tricky ranking function. I wonder if they compared it to combining TF-IDF and the page popularity. This would help with the problem they explained.<p>It&#x27;d be interesting to see more details about how they implemented the query-specific page rank.
评论 #26667550 未加载
lernerzhang大约 4 年前
I wonder how they decide how many cases to manually label?
colesantiago大约 4 年前
Not sure why they didn&#x27;t just go with Elasticsearch?
评论 #26670877 未加载