TE
科技回声
首页24小时热榜最新最佳问答展示工作
GitHubTwitter
首页

科技回声

基于 Next.js 构建的科技新闻平台,提供全球科技新闻和讨论内容。

GitHubTwitter

首页

首页最新最佳问答展示工作

资源链接

HackerNews API原版 HackerNewsNext.js

© 2025 科技回声. 版权所有。

Show HN: ColiVara – State of the Art RAG API with Vision Models

10 点作者 jonathan-adly6 个月前
we have been working on ColiVara and wanted to show it to the community. ColiVara is an api-first implementation of the ColPali paper using ColQwen2 as the LLM model. It works exactly like RAG from the end-user standpoint - but using vision models instead of chunking and text-processing for documents.<p>Why should anyone working with RAG care?<p>ColPali makes information retrieval from visual document types - like PDFs - better. Colivara is a suite of services that allows you to store, search, and retrieve documents based on their visual embedding built on top of ColPali.<p>(We are not affiliated with the ColPali team in anyway, although we are big fans of their work!)<p>Information retrieval from PDFs is hard because they contain various components: Text, images, tables, different headings, captions, complex layouts, etc.<p>For this, parsing PDFs currently requires multiple complex steps:<p>1. OCR<p>2. Layout recognition<p>3. Figure captioning<p>4. Chunking<p>5. Embedding<p>Not only are these steps complex and time-consuming, but they are also prone to error.<p>This is where ColPali comes into play. But what is ColPali?<p>ColPali combines: • Col -&gt; the contextualized late interaction mechanism introduced in ColBERT • Pali -&gt; with a Vision Language Model (VLM), in this case, PaliGemma<p>(note - both us and the ColPali team moved from PaliGemma to use Qwen models)<p>And how does it work?<p>During indexing, the complex PDF parsing steps are replaced by using &quot;screenshots&quot; of the PDF pages directly. These screenshots are then embedded with the VLM. At inference time, the query is embedded and matched with a late interaction mechanism to retrieve the most similar document pages.<p>Ok - so what exactly ColiVara does?<p>ColiVara is an API (with a Python SDK) that makes this whole process easy and viable for production workloads. With 1-line of code - you get a SOTA retrieval in your RAG system. We optimized how the embeddings are stored (using pgVector and halfvecs) as well as re-implemented the scoring to happen in Postgres, similar to and building on pgVector work with Cosine Similarity. All what the user have to do is:<p>1. Upsert a document to ColiVara to index it<p>2. At query time - perform a search and get the top-k pages<p>We support advanced filtering based on arbitrary document and collection metadata as well. So, we support re-ranking use cases and hybrid search.<p>State of the art?<p>We started this whole journey when we tried to do RAG over clinical trials and medical literature. We simply had too many failures and up to 30% of the paper was lost or malformed. This is just not our experience, in the ColPali paper - on average ColPali outperformed Unstructured + BM25 + captioning by 15+ points. ColiVara with its optimizations is is 20+ points.<p>We used NCDG@5 - which is similar to Recall but more demanding, as it measure not just if the right results are returned, but if they returned in the correct order.<p>You can see our full eval results here: <a href="https:&#x2F;&#x2F;github.com&#x2F;tjmlabs&#x2F;ColiVara-eval">https:&#x2F;&#x2F;github.com&#x2F;tjmlabs&#x2F;ColiVara-eval</a><p>If this sounds like something you could use, check it out on GitHub: <a href="https:&#x2F;&#x2F;github.com&#x2F;tjmlabs&#x2F;ColiVara">https:&#x2F;&#x2F;github.com&#x2F;tjmlabs&#x2F;ColiVara</a><p>It’s fair-source with an FSL license (similar to Sentry), and we’d love to hear how you’d use it or any feedback you might have.<p>Additionally - our eval repo is public and we continuously run against major releases. You are welcome to run the evals independently: <a href="https:&#x2F;&#x2F;github.com&#x2F;tjmlabs&#x2F;ColiVara-eval">https:&#x2F;&#x2F;github.com&#x2F;tjmlabs&#x2F;ColiVara-eval</a>

暂无评论

暂无评论