TE
TechEcho
Home24h TopNewestBestAskShowJobs
GitHubTwitter
Home

TechEcho

A tech news platform built with Next.js, providing global tech news and discussions.

GitHubTwitter

Home

HomeNewestBestAskShowJobs

Resources

HackerNews APIOriginal HackerNewsNext.js

© 2025 TechEcho. All rights reserved.

Show HN: Using VLLMs for RAG – skip the fragile OCR

2 pointsby jonathan-adly6 months ago
Hi HN<p>We wanted to show Colivara! It is a suite of services that allows you to store, search, and retrieve documents based on their visual embeddings and understanding.<p>ColiVara has state of the art retrieval performance on both *text* and visual documents, offering superior multimodal understanding and control.<p>It is a api-first implementation of the ColPali paper using ColQwen2 as the vLLM model. It works exactly like RAG from the end-user standpoint - but using vision models instead of chunking and text-processing for documents. No OCR, no text extraction, no broken tables, or missing images. What you see, is what you get.<p>On evals - it outperformed OCR + BM25 by 33%. It is also much better than captioning + BM25 by a similar amount.<p>Unlike traditional OCR(caption)&#x2F;chunk&#x2F;embed pipelines with Cosine similarity - where there are lots of fragility. ColiVara embeds documents at the page level and uses ColBert-style maxsim calculations. These are computationally demanding, but are much better at retrieval tasks. You can read about our benchmarking here: <a href="https:&#x2F;&#x2F;blog.colivara.com&#x2F;from-cosine-to-dot-benchmarking-similarity-methods-for-speed-and-precision" rel="nofollow">https:&#x2F;&#x2F;blog.colivara.com&#x2F;from-cosine-to-dot-benchmarking-si...</a><p>Looking forward to hearing your feedback.

no comments

no comments