科技回声

6 条评论

fareesh将近 5 年前

Indexing latency aside, when I search for something on Twitter it shows me results as I typeWhen I tap on the result, there is invariably something else at the tap locationWhat's the terminology for this? Flash of ephemeral search result?Are there any good ways of avoiding this problem?

评论 #23652627 未加载

评论 #23652661 未加载

评论 #23655409 未加载

评论 #23654793 未加载

评论 #23654455 未加载

评论 #23653616 未加载

评论 #23656534 未加载

评论 #23652795 未加载

评论 #23658828 未加载

stereosteve将近 5 年前

This is excellent.I was recently reviewing Lucene concepts and found this video really good: <a href="https://www.youtube.com/watch?v=T5RmMNDR5XI" rel="nofollow">https://www.youtube.com/watch?v=T5RmMNDR5XI</a>Also this site has a series of Lucene articles that are pretty nice. The one on Term Vectors in particular: <a href="http://makble.com/what-is-term-vector-in-lucene" rel="nofollow">http://makble.com/what-is-term-vector-in-lucene</a>Based on some quick research it seems like Lucene is already using a sorted skip data structure for the posting list, so I wonder why they had to do a custom implementation? Perhaps it has to do with their custom Document ID scheme and how they want to preserve order in the Posting List being different from the default behavior. It also sounds like searchers are searching on indexes as they're being written, and there is some custom coordination around visibility, which might require diverging from Lucene default behavior.Either way, pretty impressive!

评论 #23657517 未加载

tpmx将近 5 年前

Google: "Yah, we did that 20 years ago on mechanical hard drives."Seriously though: Google built realtime indexing a very long time ago.I co-implemented a small-scale (like 100k pages) full text search engine about 20 years ago with a lot of inspiration from the 1998 paper "The Anatomy of a Large-Scale Hypertextual Web Search Engine".I had always assumed Google used 2-3 layers sort of like in Hierarchical storage management (HSM); fresh data stored in RAM and older data stored on HDDs, then combining them during the query step. I was itching to have a go at implementing that, but it wasn't really required for our use case.

评论 #23658536 未加载

simonw将近 5 年前

This is a really good piece of technical writing. I particularly enjoyed the explanation of skip lists.

评论 #23654488 未加载

FlashBlaze将近 5 年前

Is there an engineering blog where they describe how they manage to show the exact timeline from where I left off each time I open the app?

评论 #23654947 未加载

Nican将近 5 年前

Doing a quick google search, there are claims of "6,000 tweets per second in 2020", or about 6 tweets per millisecond. The blog posts mentions there is an edge case for getting more than 16 tweets per millisecond.Rather close margins assuming an exponentiation usage growth of Twitter. I wonder how long that variant is going to last.

评论 #23658045 未加载

6 条评论

fareesh将近 5 年前

评论 #23652627 未加载

评论 #23652661 未加载

评论 #23655409 未加载

评论 #23654793 未加载

评论 #23654455 未加载

评论 #23653616 未加载

评论 #23656534 未加载

评论 #23652795 未加载

评论 #23658828 未加载

stereosteve将近 5 年前

评论 #23657517 未加载

tpmx将近 5 年前

评论 #23658536 未加载

simonw将近 5 年前

This is a really good piece of technical writing. I particularly enjoyed the explanation of skip lists.

评论 #23654488 未加载

FlashBlaze将近 5 年前

Is there an engineering blog where they describe how they manage to show the exact timeline from where I left off each time I open the app?

评论 #23654947 未加载

Nican将近 5 年前

评论 #23658045 未加载

Reducing search indexing latency to one second

6 条评论

Reducing search indexing latency to one second

6 条评论