TE
科技回声
首页24小时热榜最新最佳问答展示工作
GitHubTwitter
首页

科技回声

基于 Next.js 构建的科技新闻平台,提供全球科技新闻和讨论内容。

GitHubTwitter

首页

首页最新最佳问答展示工作

资源链接

HackerNews API原版 HackerNewsNext.js

© 2025 科技回声. 版权所有。

Show HN: Using VLLMs for RAG – skip the fragile OCR

2 点作者 jonathan-adly5 个月前
Hi HN<p>We wanted to show Colivara! It is a suite of services that allows you to store, search, and retrieve documents based on their visual embeddings and understanding.<p>ColiVara has state of the art retrieval performance on both *text* and visual documents, offering superior multimodal understanding and control.<p>It is a api-first implementation of the ColPali paper using ColQwen2 as the vLLM model. It works exactly like RAG from the end-user standpoint - but using vision models instead of chunking and text-processing for documents. No OCR, no text extraction, no broken tables, or missing images. What you see, is what you get.<p>On evals - it outperformed OCR + BM25 by 33%. It is also much better than captioning + BM25 by a similar amount.<p>Unlike traditional OCR(caption)&#x2F;chunk&#x2F;embed pipelines with Cosine similarity - where there are lots of fragility. ColiVara embeds documents at the page level and uses ColBert-style maxsim calculations. These are computationally demanding, but are much better at retrieval tasks. You can read about our benchmarking here: <a href="https:&#x2F;&#x2F;blog.colivara.com&#x2F;from-cosine-to-dot-benchmarking-similarity-methods-for-speed-and-precision" rel="nofollow">https:&#x2F;&#x2F;blog.colivara.com&#x2F;from-cosine-to-dot-benchmarking-si...</a><p>Looking forward to hearing your feedback.

暂无评论

暂无评论