科技回声

Hi HNWe wanted to show Colivara! It is a suite of services that allows you to store, search, and retrieve documents based on their visual embeddings and understanding.ColiVara has state of the art retrieval performance on both *text* and visual documents, offering superior multimodal understanding and control.It is a api-first implementation of the ColPali paper using ColQwen2 as the vLLM model. It works exactly like RAG from the end-user standpoint - but using vision models instead of chunking and text-processing for documents. No OCR, no text extraction, no broken tables, or missing images. What you see, is what you get.On evals - it outperformed OCR + BM25 by 33%. It is also much better than captioning + BM25 by a similar amount.Unlike traditional OCR(caption)/chunk/embed pipelines with Cosine similarity - where there are lots of fragility. ColiVara embeds documents at the page level and uses ColBert-style maxsim calculations. These are computationally demanding, but are much better at retrieval tasks. You can read about our benchmarking here: <a href="https://blog.colivara.com/from-cosine-to-dot-benchmarking-similarity-methods-for-speed-and-precision" rel="nofollow">https://blog.colivara.com/from-cosine-to-dot-benchmarking-si...</a>Looking forward to hearing your feedback.

Show HN: Using VLLMs for RAG – skip the fragile OCR

暂无评论

Show HN: Using VLLMs for RAG – skip the fragile OCR

暂无评论