This is a vector index I built that supports insertion and k-nearest neighbors (k-NN) querying, optimized for GPUs. It operates entirely in CUDA and can process queries on half a billion vectors in under 200 milliseconds. The codebase is structured as a standalone library with an HTTP API for remote access. It’s intended for high-performance search tasks—think similarity search, AI model retrieval, or reinforcement learning replay buffers. The codebase is located at <a href="https://github.com/rodlaf/BinaryGPUIndex">https://github.com/rodlaf/BinaryGPUIndex</a>.
When and how would one use binary vectors for encoding in ML? Do you have to make your model work natively with binary vectors or is there a translation step between float and binary vectors to make it compatible?
Great work. Can you elaborate on how the radix selection works and how to get that working with float's and inner product distance? I just quickly checked the code, I'm not familiar with radix selection, but really interested in making extremely fast GPU indices.