TechEcho

13 comments

minimaxirover 7 years ago

Word2Vec and bag-of-words/tf-idf are somewhat obsolete in 2018 for modeling. For classification tasks, fasttext (<a href="https://github.com/facebookresearch/fastText" rel="nofollow">https://github.com/facebookresearch/fastText</a>) performs better and faster.Fasttext is also available in the popular NLP Python library gensim, with a good demo notebook: <a href="https://radimrehurek.com/gensim/models/fasttext.html" rel="nofollow">https://radimrehurek.com/gensim/models/fasttext.html</a>And of course, if you have a GPU, recurrent neural networks (or other deep learning architectures) are the endgame for the remaining 10% of problems (a good example is SpaCy's DL implementation: <a href="https://spacy.io/" rel="nofollow">https://spacy.io/</a>). Or use those libraries to incorporate fasttext for text encoding, which has worked well in my use cases.

评论 #16225004 未加载

评论 #16224935 未加载

评论 #16225477 未加载

评论 #16224955 未加载

评论 #16224692 未加载

评论 #16224717 未加载

评论 #16224727 未加载

评论 #16229051 未加载

评论 #16225724 未加载

评论 #16229478 未加载

评论 #16230636 未加载

评论 #16230370 未加载

评论 #16224934 未加载

odonnellryanover 7 years ago

I am not sure how many people have an issue with this, but it seems to me that computer science, just over the relatively short time I've been paying attention, is becoming more-and-more abstract in a lot of ways.You can code something incredibly complex that works great without understanding any of the math underneath. Understanding the math arguably makes you a better engineer overall, but isn't required to solve many of these problems.I think it's pretty cool, but I'm sure a lot of people have a big issue with the "just TRUST the library!" approach.

评论 #16224964 未加载

评论 #16224650 未加载

评论 #16225036 未加载

评论 #16224841 未加载

评论 #16224588 未加载

评论 #16225978 未加载

评论 #16224979 未加载

评论 #16224578 未加载

paulsutterover 7 years ago

NLP is one of the most challenging areas of research, and nothing in this article will help solve even 0.009% of those challengesExample of the wisdom herein:> Remove words that are not relevant, such as “@” twitter mentions or urls

评论 #16224851 未加载

评论 #16225038 未加载

评论 #16224824 未加载

评论 #16224978 未加载

评论 #16224689 未加载

评论 #16224616 未加载

评论 #16225487 未加载

评论 #16225394 未加载

Rickasaurusover 7 years ago

Bag of words is the death of comprehensible NLP

评论 #16224700 未加载

polm23over 7 years ago

The thing that jumped out at me in this article was the use of Lime to explain models - I hadn't heard of it before.<a href="https://github.com/marcotcr/lime" rel="nofollow">https://github.com/marcotcr/lime</a>For NLP tasks, it looks like what it does is selectively delete words from the input and check the classifier output. This way it determines which words have the biggest effect on the output without needing to know anything about how your model works.

paultopiaover 7 years ago

I think this might be the first blog post I've read to actually explain how to use word vectors as features---good for the author!

mikevmover 7 years ago

A question to the NLP experts out there -- is it possible to automatically detect various pre-defined attributes about a person by automatically analyzing relevant texts? For example, finding out whether a person is anti-capitalist by scanning his blog posts related to economics. I'm not even sure how to approach such a problem.

评论 #16225396 未加载

评论 #16225098 未加载

master_yoda_1over 7 years ago

The title os the blog is to ambitious.

CGamesPlayover 7 years ago

This was an interesting read, but when I read sentences like "The words it picked up look much more relevant!", I'm reminded of the XKCD explanation of machine learning: <a href="https://xkcd.com/1838/" rel="nofollow">https://xkcd.com/1838/</a>

hinkleyover 7 years ago

I know natural language processing predates Neuro-linguistic programming, but I still can’t see ‘NLP’ without the little hairs on the back of my neck standing up.

fnlover 7 years ago

Bad title (this is all about text classification/mining, not NLP), but a very nice introduction at that. Maybe a tad optimistic - I'd never even consider applying a classifier with 80% accuracy to the Twitter firehose (unless extremely noisy performance were a non-issue - but it never is ... :-)).

code4teeover 7 years ago

Good intro, but the approaches used here are quite basic and outdated for 2018. Not sure this solves 90% of NLP problems.

评论 #16224757 未加载

评论 #16225925 未加载

phijFTWover 7 years ago

I think<pre><code> def sanitize_characters(raw, clean): for line in input_file: out = line output_file.write(line) sanitize_characters(input_file, output_file) </code></pre> should be<pre><code> def sanitize_characters(raw, clean): for line in raw: out = line clean.write(line) sanitize_characters(input_file, output_file) </code></pre> in your notebook: <a href="https://github.com/hundredblocks/concrete_NLP_tutorial/blob/master/NLP_notebook.ipynb" rel="nofollow">https://github.com/hundredblocks/concrete_NLP_tutorial/blob/...</a>Or am I mistaken?

13 comments

minimaxirover 7 years ago

评论 #16225004 未加载

评论 #16224935 未加载

评论 #16225477 未加载

评论 #16224955 未加载

评论 #16224692 未加载

评论 #16224717 未加载

评论 #16224727 未加载

评论 #16229051 未加载

评论 #16225724 未加载

评论 #16229478 未加载

评论 #16230636 未加载

评论 #16230370 未加载

评论 #16224934 未加载

odonnellryanover 7 years ago

评论 #16224964 未加载

评论 #16224650 未加载

评论 #16225036 未加载

评论 #16224841 未加载

评论 #16224588 未加载

评论 #16225978 未加载

评论 #16224979 未加载

评论 #16224578 未加载

paulsutterover 7 years ago

评论 #16224851 未加载

评论 #16225038 未加载

评论 #16224824 未加载

评论 #16224978 未加载

评论 #16224689 未加载

评论 #16224616 未加载

评论 #16225487 未加载

评论 #16225394 未加载

Rickasaurusover 7 years ago

Bag of words is the death of comprehensible NLP

评论 #16224700 未加载

polm23over 7 years ago

paultopiaover 7 years ago

I think this might be the first blog post I've read to actually explain how to use word vectors as features---good for the author!

mikevmover 7 years ago

评论 #16225396 未加载

评论 #16225098 未加载

master_yoda_1over 7 years ago

The title os the blog is to ambitious.

CGamesPlayover 7 years ago

hinkleyover 7 years ago

I know natural language processing predates Neuro-linguistic programming, but I still can’t see ‘NLP’ without the little hairs on the back of my neck standing up.

fnlover 7 years ago

code4teeover 7 years ago

Good intro, but the approaches used here are quite basic and outdated for 2018. Not sure this solves 90% of NLP problems.

评论 #16224757 未加载

评论 #16225925 未加载

phijFTWover 7 years ago

How to solve most NLP problems: a step-by-step guide

13 comments

How to solve most NLP problems: a step-by-step guide

13 comments