TE
TechEcho
Home24h TopNewestBestAskShowJobs
GitHubTwitter
Home

TechEcho

A tech news platform built with Next.js, providing global tech news and discussions.

GitHubTwitter

Home

HomeNewestBestAskShowJobs

Resources

HackerNews APIOriginal HackerNewsNext.js

© 2025 TechEcho. All rights reserved.

LexVec, a word embedding model written in Go that outperforms word2vec

131 pointsby atrudeaualmost 9 years ago

9 comments

rspeeralmost 9 years ago
As pre-built word vectors go, Conceptnet Numberbatch [1], introduced less flippantly as the ConceptNet Vector Ensemble [2], already outperforms this on all the measures evaluated in its paper: Rare Words, MEN-3000, and WordSim-353.<p>This fact is hard to publicize because somehow the luminaries of the field decided that they didn&#x27;t care about these evaluations anymore, back when RW performance was around 0.4. I have had reviewers dismiss it as &quot;incremental improvements&quot; to improve Rare Words from 0.4 to 0.6 and to improve MEN-3000 to be as good as a high estimate of inter-annotator agreement.<p>It is possible to do much, much better than Google News skip-grams (&quot;word2vec&quot;), and one thing that helps get there is lexical knowledge of the kind that&#x27;s in ConceptNet.<p>[1] <a href="https:&#x2F;&#x2F;blog.conceptnet.io&#x2F;2016&#x2F;05&#x2F;25&#x2F;conceptnet-numberbatch-a-new-name-for-the-best-word-embeddings-you-can-download&#x2F;" rel="nofollow">https:&#x2F;&#x2F;blog.conceptnet.io&#x2F;2016&#x2F;05&#x2F;25&#x2F;conceptnet-numberbatch...</a><p>[2] <a href="https:&#x2F;&#x2F;blog.luminoso.com&#x2F;2016&#x2F;04&#x2F;06&#x2F;an-introduction-to-the-conceptnet-vector-ensemble&#x2F;" rel="nofollow">https:&#x2F;&#x2F;blog.luminoso.com&#x2F;2016&#x2F;04&#x2F;06&#x2F;an-introduction-to-the-...</a>
评论 #12173837 未加载
评论 #12177053 未加载
评论 #12175217 未加载
评论 #12176888 未加载
herrkaninalmost 9 years ago
It feels weird how word embedding models have come to refer to both the underlying model, as well as the implementation. word2vec is the implementation of two models: the continuous bag-of-word and the skipgram models by Mikolov, while LexVec implements a version of the PPMI weighted count matrix as referenced in the README file. But the papers also discuss implementation details of LexVec that has no bearing on the final accuracy. I feel like we should make more effort to keep the models and reference implementations separate.
评论 #12174218 未加载
loudmaxalmost 9 years ago
If anyone else is wondering what the heck &quot;word embedding&quot; means, it&#x27;s a natural language processing technique.<p>Here&#x27;s a nice blog post about it: <a href="http:&#x2F;&#x2F;sebastianruder.com&#x2F;word-embeddings-1&#x2F;" rel="nofollow">http:&#x2F;&#x2F;sebastianruder.com&#x2F;word-embeddings-1&#x2F;</a><p>It can process something like this: king - man + woman = queen<p>Neat-o.
评论 #12175750 未加载
评论 #12174989 未加载
mooneateralmost 9 years ago
Are there IP considerations? Word2vec is patented.
评论 #12174112 未加载
评论 #12176754 未加载
rpedelaalmost 9 years ago
Slightly off-topic, but I thought this would be a good place to ask.<p>Are there any word embedding tools which take a Lucene&#x2F;Solr&#x2F;ES index as input and output a synonyms file which can be used to improve search recall?
评论 #12178421 未加载
评论 #12179859 未加载
IshKebabalmost 9 years ago
Has anyone done any work on handing words that have overloading meanings? Something like &#x27;lead&#x27; has two really distinct uses. It&#x27;s really multiple words that happened to be spelt the same.
评论 #12174469 未加载
评论 #12175101 未加载
评论 #12178392 未加载
评论 #12174934 未加载
ianbertolaccialmost 9 years ago
Reminds me of Chord[1], word2vec written in Chapel<p>[1] <a href="https:&#x2F;&#x2F;github.com&#x2F;briangu&#x2F;chord" rel="nofollow">https:&#x2F;&#x2F;github.com&#x2F;briangu&#x2F;chord</a>
risalmost 9 years ago
Well done, that&#x27;s probably the <i>least</i> relevant use of &quot;written in go&quot; in a HN headline I&#x27;ve seen. And there&#x27;s some stiff competition for that title.
PaulHoulealmost 9 years ago
From the viewpoint of commercial applications I find this profoundly depressing.<p>When the state of the art for accuracy is 0.6 on some task, you are going to always be a bridesmaid and never a bride, but hey, you can get bragging rights cause you did well on Kaggle.
评论 #12176681 未加载