TechEcho

5 comments

amiroucheover 8 years ago

Nobody mentionned SyntaxNet or LinkGrammar. If you did not read the article from Chomsky about the two ways of doing AI you should read it. Basically it says there is statistical methods and logic methods in AI. Most of NLP libraries of today use the statistic approach. The other, the logic rules based approach was the most popular before now. Anyway, that's what does Link Grammar. I recommend you start with the introduction <a href="https://amirouche.github.io/link-grammar-website//documentation/dictionary/introduction.html" rel="nofollow">https://amirouche.github.io/link-grammar-website//documentat...</a> to get a feeling of what it means to deliver meanings to sentences.Also nowdays, word2vec is unrelated to the understanding of grammatical constructs in natural languages. It's simply said a coincidence or co-occurance of words. Grammartical interpretation of a sentence must be seens as a general graph whereas word2vec operate on the linear structure of sentences (one word after the other). If word2vec had to work on grammatical constructs it should be able to ingest graph data. word2vec works on matrices where the graphical representation of the grammar of sentence (POS tagging, dependencies, anaphora, probably others) is graph otherwise said a sparse matrix or a matrix with a big number of dimensions. (It seems to me machine learning is always about dimension reduction with some noise).I am quite ignorant about the literature on the subject of machine learning operated to/from graphical datastructures.

评论 #13691269 未加载

Bitcoincadreover 8 years ago

First, North African languages are called Arabic. The proper written form of Arabic is the same in every country. The Berber language never had a written language or letters and only confuses the matter. It is a tool used to divide the people. Can you imagine Palestinians demanding Caananite be included as an official language? The most common modern standard Arabic would be found in Syria, Lebanon,Jordan and Palestine, with the Egyptian and Iraqi dialects also well understood. The North African dialects need a major overhaul. In Morrocco, they have borrowed even German words and the pace is so fast half the words are mumbled. Use modern standard Arabic as your focus,and perhaps Latin letters to make it easier on non natives while being able to translate it back to Arabic letters.

web64over 8 years ago

I haven't tried it yet, but Spacy has a guide[1] for adding a new languages to their python NLP framework. Maybe it can be of use to you.[1] <a href="https://spacy.io/docs/usage/adding-languages" rel="nofollow">https://spacy.io/docs/usage/adding-languages</a>

评论 #13690033 未加载

probably_wrongover 8 years ago

If you want to go directly into coding, the Stanford NLP Parser lists in point 5 of their FAQ[1] some starting instructions for parsing a new language.If you can deal with the math, some papers such as [2] use corpora for existing languages as a tool to parse new languages, for which there are not too many resources available.In both cases, you can always contact the authors. They might know how to help with your project, and/or direct you to the right people.[1] <a href="http://nlp.stanford.edu/software/parser-faq.shtml#d" rel="nofollow">http://nlp.stanford.edu/software/parser-faq.shtml#d</a>[2] <a href="https://www.aclweb.org/anthology/Q/Q16/Q16-1022.pdf" rel="nofollow">https://www.aclweb.org/anthology/Q/Q16/Q16-1022.pdf</a>

franciscopover 8 years ago

Stanford's NLP course is a good place to start learning about the theoretical knowledge: <a href="https://youtube.com/watch?v=nfoudtpBV68" rel="nofollow">https://youtube.com/watch?v=nfoudtpBV68</a>Then it highly depends on the language; for instance tokenization (split sentence into words) is really easy in English, Spanish, etc compared to Japanese, Chinese, etc. So I would say a good starting point would be to try using a NLP parser for a similar language. What language is it? What kind of NLP analysis do you want to perform?

Ask HN: How to implement an NLP grammar parser for a new natural language?

5 comments

Ask HN: How to implement an NLP grammar parser for a new natural language?

5 comments