TE
TechEcho
Home24h TopNewestBestAskShowJobs
GitHubTwitter
Home

TechEcho

A tech news platform built with Next.js, providing global tech news and discussions.

GitHubTwitter

Home

HomeNewestBestAskShowJobs

Resources

HackerNews APIOriginal HackerNewsNext.js

© 2025 TechEcho. All rights reserved.

Glove: Global vectors for word representation

42 pointsby vkhucalmost 11 years ago

4 comments

teraflopalmost 11 years ago
The &quot;download word vectors&quot; links are broken. Actual data is here: <a href="http://www-nlp.stanford.edu/data/" rel="nofollow">http:&#x2F;&#x2F;www-nlp.stanford.edu&#x2F;data&#x2F;</a>
languagehackeralmost 11 years ago
This is pretty badass. I&#x27;m assuming unseen words are really what&#x27;s left to work on. If you ensemble this model it with one that uses the same ideas but generalizes outside of specific terms, you might be able to get there. For instance, generate a matrix that represents n-gram word sequences as each word&#x27;s part of speech and semantic category. When making predictions on unseen words only, you then can use those values to help guide your prediction. You could use cues in phonology and morphology to predict the unseen word&#x27;s semantic category. You could build off that value with cues from morphology and word ordering to predict the part of speech of the word. Once you have that, and the information for adjacent, existing words, you might be able to make a more reliable prediction on even hapax legomena.
评论 #8150283 未加载
LisaGalmost 11 years ago
So excited so see Common Crawl data be useful for such fascinating work!<p>I work at Common Crawl :)
heyalexejalmost 11 years ago
This looks very interesting. Couldn&#x27;t find anything about a licence though.