TE
TechEcho
Home24h TopNewestBestAskShowJobs
GitHubTwitter
Home

TechEcho

A tech news platform built with Next.js, providing global tech news and discussions.

GitHubTwitter

Home

HomeNewestBestAskShowJobs

Resources

HackerNews APIOriginal HackerNewsNext.js

© 2025 TechEcho. All rights reserved.

Ask HN: Seeking a Vector Database for ClickHouse Users – Suggestions Appreciated

5 pointsby siyudabout 2 years ago
I am currently working on a project that requires me to store and efficiently query large amounts of multidimensional data, and I believe a vector database could provide the perfect solution. However, I am unsure which one would best integrate with ClickHouse. If any of you have had experience using a vector database in conjunction with ClickHouse, I would be immensely grateful for your recommendations and insights.

8 comments

tomhamerabout 2 years ago
You should check out <a href="https:&#x2F;&#x2F;github.com&#x2F;marqo-ai&#x2F;marqo">https:&#x2F;&#x2F;github.com&#x2F;marqo-ai&#x2F;marqo</a> for an end-to-end vector search database with batteries included.<p>Disclaimer, I&#x27;m from the Marqo team.
zX41ZdbWabout 2 years ago
How large is the data size (the number of vectors, and their dimensions?), what are the type of queries (N nearest neighbors to a target vector according to L2 distance, or something else?), from where the queries are sent (reccomendation system for a user request; internal requests from a ML system), the throughput and the latency requirements (how many queries per second it should serve and how quickly it should answer)?<p>ClickHouse already works good for vector search.<p>For example, if you have one million of vectors of 1024 dimensions, and you search nearest vectors to a vector by brute force search, the query will take 150 ms, which is good for a reccomendation system scenario for e-commerce, food-tech, and similar applications.<p>Example:<p><pre><code> CREATE TABLE vectors (id UInt64, vector Array(Float32)) ENGINE = Memory; SET max_block_size = 16; -- 64 KB per row INSERT INTO vectors SELECT number, arrayMap(x -&gt; randNormal(0.0, 1.0, x), range(1024)) FROM numbers_mt(1000000); -- 4 GiB WITH (SELECT vector FROM vectors LIMIT 1) AS target SELECT count() FROM vectors WHERE NOT ignore(L2SquaredDistance(vector, target)); -- 0.113 SELECT count() FROM vectors WHERE NOT ignore(L2Norm(vector)); -- 0.110 WITH (SELECT vector FROM vectors LIMIT 1) AS target SELECT count() FROM vectors WHERE NOT ignore(arraySum((x, y) -&gt; x * y, vector, target)); -- 0.150 WITH (SELECT vector FROM vectors LIMIT 1) AS target SELECT id, L2SquaredDistance(vector, target) AS distance FROM vectors ORDER BY distance LIMIT 10; -- 0.144</code></pre>
sebawitaabout 2 years ago
I would recommend <a href="https:&#x2F;&#x2F;weaviate.io&#x2F;" rel="nofollow">https:&#x2F;&#x2F;weaviate.io&#x2F;</a> Disclaimer, I work at Weaviate.<p>It is open source, super fast and really easy to work with. Plus, it can easily handle huge volumes, we even have it running with a billion objects.
bobvanluijtabout 2 years ago
Ha, great question! Coincidentally, at Weaviate, we are thinking about this too! There are a few ways to do this with AirByte (<a href="https:&#x2F;&#x2F;docs.airbyte.com&#x2F;integrations&#x2F;destinations&#x2F;weaviate&#x2F;">https:&#x2F;&#x2F;docs.airbyte.com&#x2F;integrations&#x2F;destinations&#x2F;weaviate&#x2F;</a>) or (potentially) with Spark (<a href="https:&#x2F;&#x2F;github.com&#x2F;weaviate&#x2F;spark-connector">https:&#x2F;&#x2F;github.com&#x2F;weaviate&#x2F;spark-connector</a>). Would love to collaborate on this; feel free to reach out over Slack or so.
lqhlabout 2 years ago
You can explore MyScale at <a href="https:&#x2F;&#x2F;myscale.com&#x2F;" rel="nofollow">https:&#x2F;&#x2F;myscale.com&#x2F;</a>. It&#x27;s a SaaS platform built on ClickHouse, offering more sophisticated vector indexing options like HNSW and IVF compared to the open-source version. It also provides a free tier for beta users.<p>Disclaimer: I work for MyScale.
ryadhabout 2 years ago
ClickHouse can actually store vectors as tuples or arrays. It also comes with some handy distance functions<p><a href="https:&#x2F;&#x2F;clickhouse.com&#x2F;docs&#x2F;en&#x2F;sql-reference&#x2F;functions&#x2F;distance-functions" rel="nofollow">https:&#x2F;&#x2F;clickhouse.com&#x2F;docs&#x2F;en&#x2F;sql-reference&#x2F;functions&#x2F;dista...</a>
jeadieabout 2 years ago
I&#x27;ve been using <a href="https:&#x2F;&#x2F;github.com&#x2F;jdagdelen&#x2F;hyperDB">https:&#x2F;&#x2F;github.com&#x2F;jdagdelen&#x2F;hyperDB</a> and it&#x27;s been really easy to use. I think Clickhouse support is on the short-term roadmap.
评论 #35637997 未加载
andre-zabout 2 years ago
You could check Qdrant, a dedicated Vector Database with advanced features. <a href="https:&#x2F;&#x2F;github.com&#x2F;qdrant&#x2F;qdrant">https:&#x2F;&#x2F;github.com&#x2F;qdrant&#x2F;qdrant</a> Disclaimer: I&#x27;m from the Qdrant Team.