科技回声

we have been working on ColiVara and wanted to show it to the community. ColiVara is an api-first implementation of the ColPali paper using ColQwen2 as the LLM model. It works exactly like RAG from the end-user standpoint - but using vision models instead of chunking and text-processing for documents.Why should anyone working with RAG care?ColPali makes information retrieval from visual document types - like PDFs - better. Colivara is a suite of services that allows you to store, search, and retrieve documents based on their visual embedding built on top of ColPali.(We are not affiliated with the ColPali team in anyway, although we are big fans of their work!)Information retrieval from PDFs is hard because they contain various components: Text, images, tables, different headings, captions, complex layouts, etc.For this, parsing PDFs currently requires multiple complex steps:1. OCR2. Layout recognition3. Figure captioning4. Chunking5. EmbeddingNot only are these steps complex and time-consuming, but they are also prone to error.This is where ColPali comes into play. But what is ColPali?ColPali combines: • Col -> the contextualized late interaction mechanism introduced in ColBERT • Pali -> with a Vision Language Model (VLM), in this case, PaliGemma(note - both us and the ColPali team moved from PaliGemma to use Qwen models)And how does it work?During indexing, the complex PDF parsing steps are replaced by using "screenshots" of the PDF pages directly. These screenshots are then embedded with the VLM. At inference time, the query is embedded and matched with a late interaction mechanism to retrieve the most similar document pages.Ok - so what exactly ColiVara does?ColiVara is an API (with a Python SDK) that makes this whole process easy and viable for production workloads. With 1-line of code - you get a SOTA retrieval in your RAG system. We optimized how the embeddings are stored (using pgVector and halfvecs) as well as re-implemented the scoring to happen in Postgres, similar to and building on pgVector work with Cosine Similarity. All what the user have to do is:1. Upsert a document to ColiVara to index it2. At query time - perform a search and get the top-k pagesWe support advanced filtering based on arbitrary document and collection metadata as well. So, we support re-ranking use cases and hybrid search.State of the art?We started this whole journey when we tried to do RAG over clinical trials and medical literature. We simply had too many failures and up to 30% of the paper was lost or malformed. This is just not our experience, in the ColPali paper - on average ColPali outperformed Unstructured + BM25 + captioning by 15+ points. ColiVara with its optimizations is is 20+ points.We used NCDG@5 - which is similar to Recall but more demanding, as it measure not just if the right results are returned, but if they returned in the correct order.You can see our full eval results here: <a href="https://github.com/tjmlabs/ColiVara-eval">https://github.com/tjmlabs/ColiVara-eval</a>If this sounds like something you could use, check it out on GitHub: <a href="https://github.com/tjmlabs/ColiVara">https://github.com/tjmlabs/ColiVara</a>It’s fair-source with an FSL license (similar to Sentry), and we’d love to hear how you’d use it or any feedback you might have.Additionally - our eval repo is public and we continuously run against major releases. You are welcome to run the evals independently: <a href="https://github.com/tjmlabs/ColiVara-eval">https://github.com/tjmlabs/ColiVara-eval</a>

Show HN: ColiVara – State of the Art RAG API with Vision Models

暂无评论

Show HN: ColiVara – State of the Art RAG API with Vision Models

暂无评论