TE
科技回声
首页24小时热榜最新最佳问答展示工作
GitHubTwitter
首页

科技回声

基于 Next.js 构建的科技新闻平台,提供全球科技新闻和讨论内容。

GitHubTwitter

首页

首页最新最佳问答展示工作

资源链接

HackerNews API原版 HackerNewsNext.js

© 2025 科技回声. 版权所有。

Glove: Global vectors for word representation

42 点作者 vkhuc将近 11 年前

4 条评论

teraflop将近 11 年前
The &quot;download word vectors&quot; links are broken. Actual data is here: <a href="http://www-nlp.stanford.edu/data/" rel="nofollow">http:&#x2F;&#x2F;www-nlp.stanford.edu&#x2F;data&#x2F;</a>
languagehacker将近 11 年前
This is pretty badass. I&#x27;m assuming unseen words are really what&#x27;s left to work on. If you ensemble this model it with one that uses the same ideas but generalizes outside of specific terms, you might be able to get there. For instance, generate a matrix that represents n-gram word sequences as each word&#x27;s part of speech and semantic category. When making predictions on unseen words only, you then can use those values to help guide your prediction. You could use cues in phonology and morphology to predict the unseen word&#x27;s semantic category. You could build off that value with cues from morphology and word ordering to predict the part of speech of the word. Once you have that, and the information for adjacent, existing words, you might be able to make a more reliable prediction on even hapax legomena.
评论 #8150283 未加载
LisaG将近 11 年前
So excited so see Common Crawl data be useful for such fascinating work!<p>I work at Common Crawl :)
heyalexej将近 11 年前
This looks very interesting. Couldn&#x27;t find anything about a licence though.