The "download word vectors" links are broken. Actual data is here: <a href="http://www-nlp.stanford.edu/data/" rel="nofollow">http://www-nlp.stanford.edu/data/</a>
This is pretty badass. I'm assuming unseen words are really what's left to work on. If you ensemble this model it with one that uses the same ideas but generalizes outside of specific terms, you might be able to get there. For instance, generate a matrix that represents n-gram word sequences as each word's part of speech and semantic category. When making predictions on unseen words only, you then can use those values to help guide your prediction. You could use cues in phonology and morphology to predict the unseen word's semantic category. You could build off that value with cues from morphology and word ordering to predict the part of speech of the word. Once you have that, and the information for adjacent, existing words, you might be able to make a more reliable prediction on even hapax legomena.