科技回声

9 条评论

rspeer将近 9 年前

As pre-built word vectors go, Conceptnet Numberbatch [1], introduced less flippantly as the ConceptNet Vector Ensemble [2], already outperforms this on all the measures evaluated in its paper: Rare Words, MEN-3000, and WordSim-353.This fact is hard to publicize because somehow the luminaries of the field decided that they didn't care about these evaluations anymore, back when RW performance was around 0.4. I have had reviewers dismiss it as "incremental improvements" to improve Rare Words from 0.4 to 0.6 and to improve MEN-3000 to be as good as a high estimate of inter-annotator agreement.It is possible to do much, much better than Google News skip-grams ("word2vec"), and one thing that helps get there is lexical knowledge of the kind that's in ConceptNet.[1] <a href="https://blog.conceptnet.io/2016/05/25/conceptnet-numberbatch-a-new-name-for-the-best-word-embeddings-you-can-download/" rel="nofollow">https://blog.conceptnet.io/2016/05/25/conceptnet-numberbatch...</a>[2] <a href="https://blog.luminoso.com/2016/04/06/an-introduction-to-the-conceptnet-vector-ensemble/" rel="nofollow">https://blog.luminoso.com/2016/04/06/an-introduction-to-the-...</a>

评论 #12173837 未加载

评论 #12177053 未加载

评论 #12175217 未加载

评论 #12176888 未加载

herrkanin将近 9 年前

It feels weird how word embedding models have come to refer to both the underlying model, as well as the implementation. word2vec is the implementation of two models: the continuous bag-of-word and the skipgram models by Mikolov, while LexVec implements a version of the PPMI weighted count matrix as referenced in the README file. But the papers also discuss implementation details of LexVec that has no bearing on the final accuracy. I feel like we should make more effort to keep the models and reference implementations separate.

评论 #12174218 未加载

loudmax将近 9 年前

If anyone else is wondering what the heck "word embedding" means, it's a natural language processing technique.Here's a nice blog post about it: <a href="http://sebastianruder.com/word-embeddings-1/" rel="nofollow">http://sebastianruder.com/word-embeddings-1/</a>It can process something like this: king - man + woman = queenNeat-o.

评论 #12175750 未加载

评论 #12174989 未加载

mooneater将近 9 年前

Are there IP considerations? Word2vec is patented.

评论 #12174112 未加载

评论 #12176754 未加载

rpedela将近 9 年前

Slightly off-topic, but I thought this would be a good place to ask.Are there any word embedding tools which take a Lucene/Solr/ES index as input and output a synonyms file which can be used to improve search recall?

评论 #12178421 未加载

评论 #12179859 未加载

IshKebab将近 9 年前

Has anyone done any work on handing words that have overloading meanings? Something like 'lead' has two really distinct uses. It's really multiple words that happened to be spelt the same.

评论 #12174469 未加载

评论 #12175101 未加载

评论 #12178392 未加载

评论 #12174934 未加载

ianbertolacci将近 9 年前

Reminds me of Chord[1], word2vec written in Chapel[1] <a href="https://github.com/briangu/chord" rel="nofollow">https://github.com/briangu/chord</a>

ris将近 9 年前

Well done, that's probably the least relevant use of "written in go" in a HN headline I've seen. And there's some stiff competition for that title.

PaulHoule将近 9 年前

From the viewpoint of commercial applications I find this profoundly depressing.When the state of the art for accuracy is 0.6 on some task, you are going to always be a bridesmaid and never a bride, but hey, you can get bragging rights cause you did well on Kaggle.

评论 #12176681 未加载

9 条评论

rspeer将近 9 年前

评论 #12173837 未加载

评论 #12177053 未加载

评论 #12175217 未加载

评论 #12176888 未加载

herrkanin将近 9 年前

评论 #12174218 未加载

loudmax将近 9 年前

评论 #12175750 未加载

评论 #12174989 未加载

mooneater将近 9 年前

Are there IP considerations? Word2vec is patented.

评论 #12174112 未加载

评论 #12176754 未加载

rpedela将近 9 年前

评论 #12178421 未加载

评论 #12179859 未加载

IshKebab将近 9 年前

Has anyone done any work on handing words that have overloading meanings? Something like 'lead' has two really distinct uses. It's really multiple words that happened to be spelt the same.

评论 #12174469 未加载

评论 #12175101 未加载

评论 #12178392 未加载

评论 #12174934 未加载

ianbertolacci将近 9 年前

Reminds me of Chord[1], word2vec written in Chapel[1] <a href="https://github.com/briangu/chord" rel="nofollow">https://github.com/briangu/chord</a>

ris将近 9 年前

Well done, that's probably the least relevant use of "written in go" in a HN headline I've seen. And there's some stiff competition for that title.

PaulHoule将近 9 年前

评论 #12176681 未加载

LexVec, a word embedding model written in Go that outperforms word2vec

9 条评论

LexVec, a word embedding model written in Go that outperforms word2vec

9 条评论