Note that the original word2vec binary format is extremely bad for low-memory use. It stores the words and vectors interleaved (and words are obviously of a variable length). Of course, you could circumvent this problem by building a separate (sorted) index file.<p>However, newer formats, such as the fastText binary format, store the embedding matrix contiguously. In such formats you can just memory-map the embedding matrix and you only have to load the vocabulary into memory. This is even simpler than the approach described here and in the Delft README, you don't have any serialization/deserialization overhead [1], you can let the OS decide how much to cache in memory, and you have one dependency less (mmap is in POSIX [2]).<p>[1] Of course, if you have a system with different endianness, you have to do byte swapping.<p>[2] <a href="http://pubs.opengroup.org/onlinepubs/7908799/xsh/mmap.html" rel="nofollow">http://pubs.opengroup.org/onlinepubs/7908799/xsh/mmap.html</a>