This is excellent.<p>I was recently reviewing Lucene concepts and found this video really good:
<a href="https://www.youtube.com/watch?v=T5RmMNDR5XI" rel="nofollow">https://www.youtube.com/watch?v=T5RmMNDR5XI</a><p>Also this site has a series of Lucene articles that are pretty nice. The one on Term Vectors in particular:
<a href="http://makble.com/what-is-term-vector-in-lucene" rel="nofollow">http://makble.com/what-is-term-vector-in-lucene</a><p>Based on some quick research it seems like Lucene is already using a sorted skip data structure for the posting list, so I wonder why they had to do a custom implementation? Perhaps it has to do with their custom Document ID scheme and how they want to preserve order in the Posting List being different from the default behavior. It also sounds like searchers are searching on indexes as they're being written, and there is some custom coordination around visibility, which might require diverging from Lucene default behavior.<p>Either way, pretty impressive!