Hi HN, we are announcing a TypeScript library for re-ranking search results from vector databases or from full text search indexes. Re-Ranking is a very important step in retrieval for building RAG applications. It almost immediately improves accuracy of LLM's response synthesis as you are able to feed in much more accurate and relevant context. Why? Because while semantic or full text search systems are designed to be fast and fetch semantically or lexically close document chunks, they don't rank the chunks based on the intent of the user's query. This is where a re-ranker comes in.<p>Why did we build this?<p>We couldn't find a self contained framework independent, re-ranking library for Typescript. We also wanted to swap models and algorithms easily, publish and track latency metrics closely.
We implemented two different re-ranking algorithms -
1. LLM based re-ranking: It uses the algorithm presented in the paper - "Is ChatGPT Good at Search?" <a href="https://arxiv.org/abs/2304.09542" rel="nofollow">https://arxiv.org/abs/2304.09542</a> - they implement a sliding window based algorithm to re-rank search results which could be potentially larger than the context length of an LLM. We added support for LLama3 and GPT-4. For Llama3, we are using Groq, but other model providers can be added easily.
2. Reciprocal Rank Fusion - A lightweight algorithm to merge search results from more than one index while preserving their relative importance.<p>We recently built a consumer application which indexes 100s of 1000s of images using Indexfiy (<a href="https://getindexfiy.ai" rel="nofollow">https://getindexfiy.ai</a>), which indexes various aspects of the image - caption(using a text embedding model), visual descriptions(using a VLM), CLIP embeddings. During the retrieval process we lookup 40 images from each of the index based on user queries, and then use this library to re-rank the results. The results are frankly amazing, compared to not re-ranking at all.<p>Latency -
Latency is a big deal for applications which humans interact with, and LLama3 8B on Groq is the fastest LLM re-ranker based on our experience. They are able to process ~1000 tokens/s. We are able to re-rank 100 images in roughly 1.4 seconds. We haven't tried using GPT4 in production to be able to share any latency related numbers.<p>Choosing a model -
Pick the model which has the best tradeoff for latency vs accuracy. There are many smaller re-ranking models available - Jina AI, BGE, Sentence Transformers, etc. We have heard good things about Cohere's re-ranker as well. We hope to add support for more models in the future. The library has a clean Model Provider interface, so contributions are welcome!<p>We are hoping this helps developers who are building RAG/any search based LLM applications in React or any other JavaScript framework. Love to hear your thoughts, and feedback to improve the library!