TE
科技回声
首页24小时热榜最新最佳问答展示工作
GitHubTwitter
首页

科技回声

基于 Next.js 构建的科技新闻平台,提供全球科技新闻和讨论内容。

GitHubTwitter

首页

首页最新最佳问答展示工作

资源链接

HackerNews API原版 HackerNewsNext.js

© 2025 科技回声. 版权所有。

Reducing search indexing latency to one second

149 点作者 i0exception将近 5 年前

6 条评论

fareesh将近 5 年前
Indexing latency aside, when I search for something on Twitter it shows me results as I type<p>When I tap on the result, there is invariably something else at the tap location<p>What&#x27;s the terminology for this? Flash of ephemeral search result?<p>Are there any good ways of avoiding this problem?
评论 #23652627 未加载
评论 #23652661 未加载
评论 #23655409 未加载
评论 #23654793 未加载
评论 #23654455 未加载
评论 #23653616 未加载
评论 #23656534 未加载
评论 #23652795 未加载
评论 #23658828 未加载
stereosteve将近 5 年前
This is excellent.<p>I was recently reviewing Lucene concepts and found this video really good: <a href="https:&#x2F;&#x2F;www.youtube.com&#x2F;watch?v=T5RmMNDR5XI" rel="nofollow">https:&#x2F;&#x2F;www.youtube.com&#x2F;watch?v=T5RmMNDR5XI</a><p>Also this site has a series of Lucene articles that are pretty nice. The one on Term Vectors in particular: <a href="http:&#x2F;&#x2F;makble.com&#x2F;what-is-term-vector-in-lucene" rel="nofollow">http:&#x2F;&#x2F;makble.com&#x2F;what-is-term-vector-in-lucene</a><p>Based on some quick research it seems like Lucene is already using a sorted skip data structure for the posting list, so I wonder why they had to do a custom implementation? Perhaps it has to do with their custom Document ID scheme and how they want to preserve order in the Posting List being different from the default behavior. It also sounds like searchers are searching on indexes as they&#x27;re being written, and there is some custom coordination around visibility, which might require diverging from Lucene default behavior.<p>Either way, pretty impressive!
评论 #23657517 未加载
tpmx将近 5 年前
Google: &quot;Yah, we did that 20 years ago on mechanical hard drives.&quot;<p>Seriously though: Google built realtime indexing a very long time ago.<p>I co-implemented a small-scale (like 100k pages) full text search engine about 20 years ago with <i>a lot</i> of inspiration from the 1998 paper &quot;The Anatomy of a Large-Scale Hypertextual Web Search Engine&quot;.<p>I had always assumed Google used 2-3 layers sort of like in Hierarchical storage management (HSM); fresh data stored in RAM and older data stored on HDDs, then combining them during the query step. I was itching to have a go at implementing that, but it wasn&#x27;t really required for our use case.
评论 #23658536 未加载
simonw将近 5 年前
This is a really good piece of technical writing. I particularly enjoyed the explanation of skip lists.
评论 #23654488 未加载
FlashBlaze将近 5 年前
Is there an engineering blog where they describe how they manage to show the exact timeline from where I left off each time I open the app?
评论 #23654947 未加载
Nican将近 5 年前
Doing a quick google search, there are claims of &quot;6,000 tweets per second in 2020&quot;, or about 6 tweets per millisecond. The blog posts mentions there is an edge case for getting more than 16 tweets per millisecond.<p>Rather close margins assuming an exponentiation usage growth of Twitter. I wonder how long that variant is going to last.
评论 #23658045 未加载