TechEcho

8 comments

tomhamerabout 2 years ago

You should check out <a href="https://github.com/marqo-ai/marqo">https://github.com/marqo-ai/marqo</a> for an end-to-end vector search database with batteries included.Disclaimer, I'm from the Marqo team.

zX41ZdbWabout 2 years ago

How large is the data size (the number of vectors, and their dimensions?), what are the type of queries (N nearest neighbors to a target vector according to L2 distance, or something else?), from where the queries are sent (reccomendation system for a user request; internal requests from a ML system), the throughput and the latency requirements (how many queries per second it should serve and how quickly it should answer)?ClickHouse already works good for vector search.For example, if you have one million of vectors of 1024 dimensions, and you search nearest vectors to a vector by brute force search, the query will take 150 ms, which is good for a reccomendation system scenario for e-commerce, food-tech, and similar applications.Example:<pre><code> CREATE TABLE vectors (id UInt64, vector Array(Float32)) ENGINE = Memory; SET max_block_size = 16; -- 64 KB per row INSERT INTO vectors SELECT number, arrayMap(x -> randNormal(0.0, 1.0, x), range(1024)) FROM numbers_mt(1000000); -- 4 GiB WITH (SELECT vector FROM vectors LIMIT 1) AS target SELECT count() FROM vectors WHERE NOT ignore(L2SquaredDistance(vector, target)); -- 0.113 SELECT count() FROM vectors WHERE NOT ignore(L2Norm(vector)); -- 0.110 WITH (SELECT vector FROM vectors LIMIT 1) AS target SELECT count() FROM vectors WHERE NOT ignore(arraySum((x, y) -> x * y, vector, target)); -- 0.150 WITH (SELECT vector FROM vectors LIMIT 1) AS target SELECT id, L2SquaredDistance(vector, target) AS distance FROM vectors ORDER BY distance LIMIT 10; -- 0.144</code></pre>

sebawitaabout 2 years ago

I would recommend <a href="https://weaviate.io/" rel="nofollow">https://weaviate.io/</a> Disclaimer, I work at Weaviate.It is open source, super fast and really easy to work with. Plus, it can easily handle huge volumes, we even have it running with a billion objects.

bobvanluijtabout 2 years ago

Ha, great question! Coincidentally, at Weaviate, we are thinking about this too! There are a few ways to do this with AirByte (<a href="https://docs.airbyte.com/integrations/destinations/weaviate/">https://docs.airbyte.com/integrations/destinations/weaviate/</a>) or (potentially) with Spark (<a href="https://github.com/weaviate/spark-connector">https://github.com/weaviate/spark-connector</a>). Would love to collaborate on this; feel free to reach out over Slack or so.

lqhlabout 2 years ago

You can explore MyScale at <a href="https://myscale.com/" rel="nofollow">https://myscale.com/</a>. It's a SaaS platform built on ClickHouse, offering more sophisticated vector indexing options like HNSW and IVF compared to the open-source version. It also provides a free tier for beta users.Disclaimer: I work for MyScale.

ryadhabout 2 years ago

ClickHouse can actually store vectors as tuples or arrays. It also comes with some handy distance functions<a href="https://clickhouse.com/docs/en/sql-reference/functions/distance-functions" rel="nofollow">https://clickhouse.com/docs/en/sql-reference/functions/dista...</a>

jeadieabout 2 years ago

I've been using <a href="https://github.com/jdagdelen/hyperDB">https://github.com/jdagdelen/hyperDB</a> and it's been really easy to use. I think Clickhouse support is on the short-term roadmap.

评论 #35637997 未加载

andre-zabout 2 years ago

You could check Qdrant, a dedicated Vector Database with advanced features. <a href="https://github.com/qdrant/qdrant">https://github.com/qdrant/qdrant</a> Disclaimer: I'm from the Qdrant Team.

8 comments

tomhamerabout 2 years ago

zX41ZdbWabout 2 years ago

sebawitaabout 2 years ago

bobvanluijtabout 2 years ago

lqhlabout 2 years ago

ryadhabout 2 years ago

jeadieabout 2 years ago

I've been using <a href="https://github.com/jdagdelen/hyperDB">https://github.com/jdagdelen/hyperDB</a> and it's been really easy to use. I think Clickhouse support is on the short-term roadmap.

评论 #35637997 未加载

andre-zabout 2 years ago

You could check Qdrant, a dedicated Vector Database with advanced features. <a href="https://github.com/qdrant/qdrant">https://github.com/qdrant/qdrant</a> Disclaimer: I'm from the Qdrant Team.

Ask HN: Seeking a Vector Database for ClickHouse Users – Suggestions Appreciated

8 comments

Ask HN: Seeking a Vector Database for ClickHouse Users – Suggestions Appreciated

8 comments