TechEcho

14 comments

bhaakover 7 years ago

> Essentially, when dealing with natural languages hacking a solution is the suggested way of doing things, since nobody can figure out how to do it properly.That's really the TL;DR I also got from the computational linguistic courses I attended.There's probably the Pareto principle at works. Having no solution is worse than having an 80% solution that works well enough when the 100% solution is much harder to achieve (and some of the problems not even humans would be able to solve properly).

评论 #15705884 未加载

评论 #15706755 未加载

评论 #15704411 未加载

评论 #15704371 未加载

nlover 7 years ago

Ha, there's a whole section on clones of the summarizer from Classifier4J.I wrote that in 2003 (I think?) based on @pg's "A plan for spam" essay, and then "invented" the summarization approach (I'm sure others had done similar, but I thought it up myself anyway).Turns out it was rather well tuned. The 2003 implementation, presumably downloaded from sourceforge(!) still wins comparisons on datasets which didn't even exist when I wrote it[1].I much prefer the Python implementation though[2], which I hadn't seen before.Also, Textacy on top of Spacy is awesome for any kind of text work.[1] <a href="https://dl.acm.org/citation.cfm?id=2797081" rel="nofollow">https://dl.acm.org/citation.cfm?id=2797081</a>[2] <a href="https://github.com/thavelick/summarize/blob/master/summarize.py" rel="nofollow">https://github.com/thavelick/summarize/blob/master/summarize...</a>

ameliusover 7 years ago

There are a few applications missing:- Answering a question by returning a search result from a large body of texts. E.g. "How do I change the background color of a page in Javascript?"- Improving the readability of a text. The article only mentions "understanding how difficult to read is a text".- Establishing relationships between entities in a body of text. E.g. we could build a fact-graph from sentences like "Burning coal increases CO2", and "CO2 increase induces global warming". Useful also in medical literature where there are millions of pathways.- Answering a question, using a large body of facts. Like search, but now it gives a precise answer.- Finding and correcting spelling/grammatical errors.

评论 #15704079 未加载

评论 #15703619 未加载

评论 #15703349 未加载

评论 #15703490 未加载

kinowover 7 years ago

A lot to review, read, learn. Thanks a lot for sharing this. Any plans to extend it or have another one including even more, like Natural Language Generation (not limited to bots, we are using it in weather forecast), and co-reference?

评论 #15703850 未加载

fnlover 7 years ago

I'm always astonished how little mention gensim gets, considering that it can basically be used for all the listed tasks, including parsing, if you combine it with your favorite deep learning library (DyNet, anyone?).

评论 #15704500 未加载

pencilcodeover 7 years ago

Regarding finding similar documents what is the state of the art nowadays, LDA, word2vec, something else? What do you normally use?

评论 #15707447 未加载

评论 #15707616 未加载

评论 #15706537 未加载

visargaover 7 years ago

First time I see reading time and readability score mentioned together with NLP.

baneover 7 years ago

Was hoping for some discussion about word vectors like word2vec. I keep reading about them, but don't really understand what they're useful for.

评论 #15703749 未加载

评论 #15703798 未加载

评论 #15703743 未加载

评论 #15704999 未加载

d23over 7 years ago

My experience with your site on mobile: <a href="https://m.imgur.com/5vLrEJH" rel="nofollow">https://m.imgur.com/5vLrEJH</a>Can't get it to go away, can't read the article.

arcanusover 7 years ago

Is there an equivalent to MNIST for NLP? I've always wanted to play around in this space but I don't know a good, and simple, database to start with.

评论 #15707723 未加载

评论 #15704378 未加载

评论 #15704304 未加载

评论 #15707643 未加载

betageekover 7 years ago

Your 'send me a PDF' popup has the background fade div above the form so it's impossible to fill in the form (without opening dev tools).

评论 #15703440 未加载

评论 #15703435 未加载

rpedelaover 7 years ago

Using Chrome on both a Chromebook and Galaxy S5, the right sidebar is screwed up. On the phone, it completely blocks the content.

Boothroidover 7 years ago

Quite an obnoxious website on my phone. Anyway I came here to point to GATE as a mature FLOSS option: <a href="https://gate.ac.uk/" rel="nofollow">https://gate.ac.uk/</a>

alexasmythsover 7 years ago

Recommend Dan Jurafsky and Chris Manning @ Stanford online course:<a href="https://www.youtube.com/watch?v=nfoudtpBV68" rel="nofollow">https://www.youtube.com/watch?v=nfoudtpBV68</a>

14 comments

bhaakover 7 years ago

评论 #15705884 未加载

评论 #15706755 未加载

评论 #15704411 未加载

评论 #15704371 未加载

nlover 7 years ago

ameliusover 7 years ago

评论 #15704079 未加载

评论 #15703619 未加载

评论 #15703349 未加载

评论 #15703490 未加载

kinowover 7 years ago

评论 #15703850 未加载

fnlover 7 years ago

评论 #15704500 未加载

pencilcodeover 7 years ago

Regarding finding similar documents what is the state of the art nowadays, LDA, word2vec, something else? What do you normally use?

评论 #15707447 未加载

评论 #15707616 未加载

评论 #15706537 未加载

visargaover 7 years ago

First time I see reading time and readability score mentioned together with NLP.

baneover 7 years ago

Was hoping for some discussion about word vectors like word2vec. I keep reading about them, but don't really understand what they're useful for.

评论 #15703749 未加载

评论 #15703798 未加载

评论 #15703743 未加载

评论 #15704999 未加载

d23over 7 years ago

My experience with your site on mobile: <a href="https://m.imgur.com/5vLrEJH" rel="nofollow">https://m.imgur.com/5vLrEJH</a>Can't get it to go away, can't read the article.

arcanusover 7 years ago

Is there an equivalent to MNIST for NLP? I've always wanted to play around in this space but I don't know a good, and simple, database to start with.

评论 #15707723 未加载

评论 #15704378 未加载

评论 #15704304 未加载

评论 #15707643 未加载

betageekover 7 years ago

Your 'send me a PDF' popup has the background fade div above the form so it's impossible to fill in the form (without opening dev tools).

评论 #15703440 未加载

评论 #15703435 未加载

rpedelaover 7 years ago

Using Chrome on both a Chromebook and Galaxy S5, the right sidebar is screwed up. On the phone, it completely blocks the content.

Boothroidover 7 years ago

Quite an obnoxious website on my phone. Anyway I came here to point to GATE as a mature FLOSS option: <a href="https://gate.ac.uk/" rel="nofollow">https://gate.ac.uk/</a>

alexasmythsover 7 years ago

Recommend Dan Jurafsky and Chris Manning @ Stanford online course:<a href="https://www.youtube.com/watch?v=nfoudtpBV68" rel="nofollow">https://www.youtube.com/watch?v=nfoudtpBV68</a>

A Guide to Natural Language Processing

14 comments

A Guide to Natural Language Processing

14 comments