So I am a bit confused about the part where you go k distance out from the centroids.<p>Since there are ~5000 dimensions, in which of those dimensions are we moving k out ?<p>Is the idea you just move, out, in all dimensions such that the final Euclidean distance is k ?<p>Seems that’s how they get multiple samples at those distances.<p>Either way I think it’s more interesting to go out in specific dimensions. Ideally there is a mapping between each dimension and something inherent about the token, like the part where a dimension corresponds with the first word of the token.<p>We went through this discovery phase when we were generating images using autoencoders, same idea, some of those dimensions would correspond to certain features of the image, so moving along them would change the image output in some predictable way.<p>Either way, I think the overall structure of those spaces says something about how the human brain works ( given we invented the language). I’m interested to see if anything neurologic can be derived from those vector embeddings.