科技回声

I shipped this in Discourse a couple of weeks ago, and it's incredible tech. And fully supported in PostgreSQL via pgvector.

That is crazy. So how do you use these right now? Do you store vectors on disk and iterate over them, or something else?<p>How do you do the search?

It got me thinking, what might it look like to natively train a binary quantized embedding model? You can’t do calculus per se on {0,1}, but maybe you could do something like randomly flip bits with a probability weighted by the severity of the error during backprop… anyway, I’m sure there’s plenty of literature about this.

Is it fair to say that Hamming distance relates to cosine similarity in this context like ReLU relates to the Sigmoid function for neural networks?

Yes, binary embeddings are cool indeed. And because they can be so small they even double down as powerful cluster identifiers. Built this a while ago: <a href="https://huggingface.co/spaces/iscc/iscc-sct" rel="nofollow">https://huggingface.co/spaces/iscc/iscc-sct</a>

I wonder what would happen if we quantized each dimension to 0.5 (or even fewer) bits instead of 1, i.e., taking 2 (or more) scalar components at a time and mapping them to 0 or 1 based on some carefully designed rules.

I shipped this in Discourse a couple of weeks ago, and it's incredible tech. And fully supported in PostgreSQL via pgvector.

That is crazy. So how do you use these right now? Do you store vectors on disk and iterate over them, or something else?<p>How do you do the search?

Is it fair to say that Hamming distance relates to cosine similarity in this context like ReLU relates to the Sigmoid function for neural networks?

Binary vector embeddings are so cool

6 条评论

Binary vector embeddings are so cool

6 条评论