TechEcho

4 comments

teraflopalmost 11 years ago

The "download word vectors" links are broken. Actual data is here: <a href="http://www-nlp.stanford.edu/data/" rel="nofollow">http://www-nlp.stanford.edu/data/</a>

languagehackeralmost 11 years ago

This is pretty badass. I'm assuming unseen words are really what's left to work on. If you ensemble this model it with one that uses the same ideas but generalizes outside of specific terms, you might be able to get there. For instance, generate a matrix that represents n-gram word sequences as each word's part of speech and semantic category. When making predictions on unseen words only, you then can use those values to help guide your prediction. You could use cues in phonology and morphology to predict the unseen word's semantic category. You could build off that value with cues from morphology and word ordering to predict the part of speech of the word. Once you have that, and the information for adjacent, existing words, you might be able to make a more reliable prediction on even hapax legomena.

评论 #8150283 未加载

LisaGalmost 11 years ago

So excited so see Common Crawl data be useful for such fascinating work!<p>I work at Common Crawl :)

heyalexejalmost 11 years ago

This looks very interesting. Couldn't find anything about a licence though.

Glove: Global vectors for word representation

4 comments

Glove: Global vectors for word representation

4 comments