TechEcho

17 comments

Having worked with Simon he knows his sh*t. We talked a lot about what the ideal search stack would look when we worked together at Shopify on search (him more infra, me more ML+relevance). I discussed how I just want a thing in the cloud to provide my retrieval arms, let me express ranking in a fluent "py-data" first way, and get out of my wayMy ideal is that turbopuffer ultimately is like a Polars dataframe where all my ranking is expressed in my search API. I could just lazily express some lexical or embedding similarity, boost with various attributes like, maybe by recency, popularity, etc to get a first pass (again all just with dataframe math). Then compute features for a reranking model I run on my side - dataframe math - and it "just works" - runs all this as some kind of query execution DAG - and stays out of my way.

评论 #40922580 未加载

评论 #40934181 未加载

cmcollier11 months ago

Unrelated to the core topic, I really enjoy the aesthetic of their website. Another similar one is from Fixie.ai (also, interestingly, one of their customers).

评论 #40924468 未加载

评论 #40923242 未加载

评论 #40925835 未加载

评论 #40921496 未加载

评论 #40923134 未加载

评论 #40928101 未加载

评论 #40921849 未加载

nh211 months ago

> $3600.00/TB/monthIt doesn't have to be that way.At Hetzner I pay $200/TB/month for RAM. That's 18x cheaper.Sometimes you can reach the goal faster with less complexity by removing the part with the 20x markup.

评论 #40925135 未加载

评论 #40925202 未加载

评论 #40924513 未加载

omneity11 months ago

> In 2022, production-grade vector databases were relying on in-memory storageThis is irking me. pg_vector has existed from before that, doesn't require in-memory storage and can definitely handle vector search for 100m+ documents in a decently performant manner. Did they have a particular requirement somewhere?

评论 #40922076 未加载

bigbones11 months ago

Sounds like a source-unavailable version of Quickwit? <a href="https://quickwit.io/" rel="nofollow">https://quickwit.io/</a>

评论 #40920922 未加载

评论 #40943710 未加载

eknkc11 months ago

Is there a good general purpose solution where I can store a large read only database in s3 or something and do lookups directly on it?Duckdb can open parquet files over http and query them but I found it to trigger a lot of small requests reading bunch of places from the files. I mean a lot.I mostly need key / value lookups and could potentially store each key in a seperate object in s3 but for a couple hundred million objects.. It would be a lot more managable to have a single file and maybe a cacheable index.

评论 #40922137 未加载

评论 #40922842 未加载

评论 #40923712 未加载

评论 #40922166 未加载

评论 #40927099 未加载

solatic11 months ago

Is it feasible to try to build this kind of approach (hot SSD cache nodes sitting in front of object storage) with prior open-source art (Lucene)? Or are the search indexes themselves also proprietary in this solution?Having witnessed some very large Elasticsearch production deployments, being able to throw everything into S3 would be incredible. The applicability here isn't only for vector search.

评论 #40929482 未加载

评论 #40928889 未加载

zX41ZdbW11 months ago

A correction to the article. It mentions<pre><code> Warehouse BigQuery, Snowflake, Clickhouse ≥1s Minutes </code></pre> For ClickHouse, it should be: read latency <= 100ms, write latency <= 1s.Logging, real-time analytics, and RAG are also suitable for ClickHouse.

评论 #40926746 未加载

drodgers11 months ago

I love the object-storage-first approach; it seems like such a natural fit for the could.

cdchn11 months ago

The very long introductory page has a ton of very juicy data in it, even if you don't care about the product itself.

arnorhs11 months ago

This looks super interesting. I'm not that familiar with vector databases. I thought they were mostly something used for RAG and other AI-related stuff.Seems like a topic I need to delive into a bit more.

endisneigh11 months ago

Slightly relevant - do people really want article recommendations? I don’t think I’ve ever read an article and wanted a recommendation. Even with this one - I sort of read it and that’s it; no feeling of wanting recommendations.Am I alone in this?In any case this seems like a pretty interesting approach. Reminds me of Warpstream which does something similar with S3 to replace Kafka.

CyberDildonics11 months ago

Sounds like a filesystem with attributes in a database.

yawnxyz11 months ago

can't wait for the day the get into GA!

vidar11 months ago

Can you compare to S3 Athena (ELI5)?

yamumsahoe11 months ago

unsure if they are comparable, but is this and quickwit comparable?

hipadev2311 months ago

That’s some woefully disappointing and incorrect metrics (read and write latency are both sub-second, storage medium would be “ Memory + Replicated SSDs”) you’ve got for Clickhouse there, but I understand what you’re going for and why you categorized it where you did.

17 comments

softwaredoug11 months ago

评论 #40922580 未加载

评论 #40934181 未加载

cmcollier11 months ago

Unrelated to the core topic, I really enjoy the aesthetic of their website. Another similar one is from Fixie.ai (also, interestingly, one of their customers).

评论 #40924468 未加载

评论 #40923242 未加载

评论 #40925835 未加载

评论 #40921496 未加载

评论 #40923134 未加载

评论 #40928101 未加载

评论 #40921849 未加载

nh211 months ago

评论 #40925135 未加载

评论 #40925202 未加载

评论 #40924513 未加载

omneity11 months ago

评论 #40922076 未加载

bigbones11 months ago

Sounds like a source-unavailable version of Quickwit? <a href="https://quickwit.io/" rel="nofollow">https://quickwit.io/</a>

评论 #40920922 未加载

评论 #40943710 未加载

eknkc11 months ago

评论 #40922137 未加载

评论 #40922842 未加载

评论 #40923712 未加载

评论 #40922166 未加载

评论 #40927099 未加载

solatic11 months ago

评论 #40929482 未加载

评论 #40928889 未加载

zX41ZdbW11 months ago

评论 #40926746 未加载

drodgers11 months ago

I love the object-storage-first approach; it seems like such a natural fit for the could.

cdchn11 months ago

The very long introductory page has a ton of very juicy data in it, even if you don't care about the product itself.

arnorhs11 months ago

endisneigh11 months ago

CyberDildonics11 months ago

Sounds like a filesystem with attributes in a database.

yawnxyz11 months ago

can't wait for the day the get into GA!

vidar11 months ago

Can you compare to S3 Athena (ELI5)?

yamumsahoe11 months ago

unsure if they are comparable, but is this and quickwit comparable?

hipadev2311 months ago

Turbopuffer: Fast search on object storage

17 comments

Turbopuffer: Fast search on object storage

17 comments