TechEcho

9 comments

rspeeralmost 9 years ago

As pre-built word vectors go, Conceptnet Numberbatch [1], introduced less flippantly as the ConceptNet Vector Ensemble [2], already outperforms this on all the measures evaluated in its paper: Rare Words, MEN-3000, and WordSim-353.This fact is hard to publicize because somehow the luminaries of the field decided that they didn't care about these evaluations anymore, back when RW performance was around 0.4. I have had reviewers dismiss it as "incremental improvements" to improve Rare Words from 0.4 to 0.6 and to improve MEN-3000 to be as good as a high estimate of inter-annotator agreement.It is possible to do much, much better than Google News skip-grams ("word2vec"), and one thing that helps get there is lexical knowledge of the kind that's in ConceptNet.[1] <a href="https://blog.conceptnet.io/2016/05/25/conceptnet-numberbatch-a-new-name-for-the-best-word-embeddings-you-can-download/" rel="nofollow">https://blog.conceptnet.io/2016/05/25/conceptnet-numberbatch...</a>[2] <a href="https://blog.luminoso.com/2016/04/06/an-introduction-to-the-conceptnet-vector-ensemble/" rel="nofollow">https://blog.luminoso.com/2016/04/06/an-introduction-to-the-...</a>

评论 #12173837 未加载

评论 #12177053 未加载

评论 #12175217 未加载

评论 #12176888 未加载

herrkaninalmost 9 years ago

It feels weird how word embedding models have come to refer to both the underlying model, as well as the implementation. word2vec is the implementation of two models: the continuous bag-of-word and the skipgram models by Mikolov, while LexVec implements a version of the PPMI weighted count matrix as referenced in the README file. But the papers also discuss implementation details of LexVec that has no bearing on the final accuracy. I feel like we should make more effort to keep the models and reference implementations separate.

评论 #12174218 未加载

loudmaxalmost 9 years ago

If anyone else is wondering what the heck "word embedding" means, it's a natural language processing technique.Here's a nice blog post about it: <a href="http://sebastianruder.com/word-embeddings-1/" rel="nofollow">http://sebastianruder.com/word-embeddings-1/</a>It can process something like this: king - man + woman = queenNeat-o.

评论 #12175750 未加载

评论 #12174989 未加载

mooneateralmost 9 years ago

Are there IP considerations? Word2vec is patented.

评论 #12174112 未加载

评论 #12176754 未加载

rpedelaalmost 9 years ago

Slightly off-topic, but I thought this would be a good place to ask.Are there any word embedding tools which take a Lucene/Solr/ES index as input and output a synonyms file which can be used to improve search recall?

评论 #12178421 未加载

评论 #12179859 未加载

IshKebabalmost 9 years ago

Has anyone done any work on handing words that have overloading meanings? Something like 'lead' has two really distinct uses. It's really multiple words that happened to be spelt the same.

评论 #12174469 未加载

评论 #12175101 未加载

评论 #12178392 未加载

评论 #12174934 未加载

ianbertolaccialmost 9 years ago

Reminds me of Chord[1], word2vec written in Chapel[1] <a href="https://github.com/briangu/chord" rel="nofollow">https://github.com/briangu/chord</a>

risalmost 9 years ago

Well done, that's probably the least relevant use of "written in go" in a HN headline I've seen. And there's some stiff competition for that title.

PaulHoulealmost 9 years ago

From the viewpoint of commercial applications I find this profoundly depressing.When the state of the art for accuracy is 0.6 on some task, you are going to always be a bridesmaid and never a bride, but hey, you can get bragging rights cause you did well on Kaggle.

评论 #12176681 未加载

9 comments

rspeeralmost 9 years ago

评论 #12173837 未加载

评论 #12177053 未加载

评论 #12175217 未加载

评论 #12176888 未加载

herrkaninalmost 9 years ago

评论 #12174218 未加载

loudmaxalmost 9 years ago

评论 #12175750 未加载

评论 #12174989 未加载

mooneateralmost 9 years ago

Are there IP considerations? Word2vec is patented.

评论 #12174112 未加载

评论 #12176754 未加载

rpedelaalmost 9 years ago

评论 #12178421 未加载

评论 #12179859 未加载

IshKebabalmost 9 years ago

Has anyone done any work on handing words that have overloading meanings? Something like 'lead' has two really distinct uses. It's really multiple words that happened to be spelt the same.

评论 #12174469 未加载

评论 #12175101 未加载

评论 #12178392 未加载

评论 #12174934 未加载

ianbertolaccialmost 9 years ago

Reminds me of Chord[1], word2vec written in Chapel[1] <a href="https://github.com/briangu/chord" rel="nofollow">https://github.com/briangu/chord</a>

risalmost 9 years ago

Well done, that's probably the least relevant use of "written in go" in a HN headline I've seen. And there's some stiff competition for that title.

PaulHoulealmost 9 years ago

评论 #12176681 未加载

LexVec, a word embedding model written in Go that outperforms word2vec

9 comments

LexVec, a word embedding model written in Go that outperforms word2vec

9 comments