TechEcho

12 comments

This is cool! A few questions:- Given how neon architecture decouples compute with storage using the safekeepers & pageservers will this still just work with neon? (Wondering because you mention the index is in-memory so unless your stateless compute nodes can somehow hot-swap the indexes I wasn't sure how that'd work?)- Do you plan to offer vector search as a plug-and-play offering? If so does neon as a product offering plan to introduce more "out-of-the-box" functionalities like vector search? (Similar to xata's offerings like search & vector search.)- unrelated question- I believe neon has very small cold startups for free/scale-to-zero configurations but are there also inherent small latencies for infrequently accessed data? In other words is it safe to say if there were a large table with records that are "old"/"archival" but also always a sort of ~last 30-days of records that are "fresh"/"accessed with more frequency" would there more likely be slight latency introduced when accessing the older records?Neon looks awesome and thanks for neons open source contributions!

评论 #36696162 未加载

jasfialmost 2 years ago

I was wondering how this compared to Qdrant. I found this:"Qdrant currently only uses HNSW as a vector index."- <a href="https://qdrant.tech/documentation/concepts/indexing" rel="nofollow noreferrer">https://qdrant.tech/documentation/concepts/indexing</a>So it would be interesting to see benchmarks between pg_embedding and Qdrant. I would expect them to perform similarly, but perhaps there are other factors?

评论 #36699514 未加载

yunyualmost 2 years ago

What's the plan/timeline for offering cosine similarity support, given that most OSS embedding models are fine tuned on a contrastive cosine distance objective?

评论 #36697007 未加载

nikitaalmost 2 years ago

CEO of Neon here.This was a relatively quick project for us and the index is currently in-memory. However it is fast! We would love your feedback and excited to invest further.

评论 #36696232 未加载

zinclozengealmost 2 years ago

Would be great to also add the ability to calculate the tanimoto coefficient for cheminformatics purposes.

评论 #36696451 未加载

评论 #36696457 未加载

akorotkovalmost 2 years ago

This blog post makes me uneasy. The pg_embedding code on github gives impression of PoC, while blog post creates impression that pg_embedding is ready for use.If we consider pg_embedding ready for use, why don't you compare with pgvector:1) The need to rebuild index every instance restart,2) Replication support?

评论 #36701548 未加载

chrisrickardalmost 2 years ago

With people talking about pgvectors current scaling issues, one thing I'm not sure about is whether it's if the Postgres DB table simply contains a lot of vectors - e.g. 500k vectors - or if you are searching over 500k vectors?E.g. If the DB table had 500k vectors, but you were pre-filtering by WHERE client_id = X (returning only 200 rows) then with an AND <embedding search> (returning only 6 rows) - would this still have the same performance issue?Or is it literally if the embedding search is over 500k rows?

carlsverrealmost 2 years ago

Very cool! It would be nice to see a working end to end integration with an LLM. Using this to generate relevant context for example. I see multiple folks mention cos similarity which HNSW doesn’t support - how does the lack of that limit what you can do with this library?Also, since this is in memory, I assume this significantly affects startup time in order to rebuild the index? Would be nice to see how bad that is for larger vector datasets.

jeffchuberalmost 2 years ago

Is it open source?

评论 #36696034 未加载

评论 #36696345 未加载

评论 #36696045 未加载

kiwicopplealmost 2 years ago

Any reason you didn’t contribute to pgvector?It would have been nice to get the support of neon in progressing pgvector - since it’s already so widely adopted by the community(disclosure: supabase ceo)

评论 #36697322 未加载

评论 #36696594 未加载

评论 #36696890 未加载

raoufchebrialmost 2 years ago

Blog post author here. Let me know if you have any questions regarding the benchmark and pg_embedding

lysecretalmost 2 years ago

Is anyone having a really good experience using embedding based semantic retrieval in combination with a downstream LLM ?I am working quite a bit with normal and chained LLMs but so far haven't explored the retrieval route.

评论 #36696642 未加载

评论 #36695987 未加载

评论 #36696682 未加载

评论 #36695931 未加载

评论 #36697567 未加载

评论 #36695853 未加载

12 comments

jamesmcintyrealmost 2 years ago

评论 #36696162 未加载

jasfialmost 2 years ago

评论 #36699514 未加载

yunyualmost 2 years ago

What's the plan/timeline for offering cosine similarity support, given that most OSS embedding models are fine tuned on a contrastive cosine distance objective?

评论 #36697007 未加载

nikitaalmost 2 years ago

CEO of Neon here.This was a relatively quick project for us and the index is currently in-memory. However it is fast! We would love your feedback and excited to invest further.

评论 #36696232 未加载

zinclozengealmost 2 years ago

Would be great to also add the ability to calculate the tanimoto coefficient for cheminformatics purposes.

评论 #36696451 未加载

评论 #36696457 未加载

akorotkovalmost 2 years ago

评论 #36701548 未加载

chrisrickardalmost 2 years ago

carlsverrealmost 2 years ago

jeffchuberalmost 2 years ago

Is it open source?

评论 #36696034 未加载

评论 #36696345 未加载

评论 #36696045 未加载

kiwicopplealmost 2 years ago

评论 #36697322 未加载

评论 #36696594 未加载

评论 #36696890 未加载

raoufchebrialmost 2 years ago

Blog post author here. Let me know if you have any questions regarding the benchmark and pg_embedding

20x faster than pgvector: HNSW index in Postgres with pg_embedding

12 comments

20x faster than pgvector: HNSW index in Postgres with pg_embedding

12 comments