科技回声

7 条评论

bonoboTP超过 5 年前

I often wonder how much of a head start the isolating nature of English gave for computing. It allowed ignoring a lot of inflectional and agglutinative complexity.Concretely I mean it's very easy to generate text using sentence templates. Just plug in words and it works out. "The $process_name has completed running." "Like $username's comment" "Ban $username".Relatedly, I think focusing NLP efforts on English masks a lot of interesting phenomena, because English text already comes in a reasonably tokenized, chunked up and pre-digested, easy to handle form. For example speech recognition systems started out with closed vocabularies, with larger and larger numbers of words, and even in their toy forms you could recognize some proper English sentences. To do that in Hungarian for example, the "upfront costs" to a "somewhat usable" system are much higher, because closed vocabulary doesn't get you anywhere. (Similarly, learning basic English is very easy, you can build 100% correct sentences on day 1, you learn "I", "you", "see" and "hear" and can say "I see" and "You see" and "I see you" and "I hear Peter" which are all 100% correct. In Hungarian these are "nézek", "nézel", "nézlek", "hallom Pétert" requiring learning several suffixes and vowel harmony and definite/indefinite conjugation. The learning curve till your first 100% correct 3-5 word sentences is just steeper.)I don't mean it's impossible to handle agglutinative languages in NLP, I just mean the "minimum viable model" is much simpler and attainable for English, which on the one hand was able to kickstart and propel the early research phases and on the other hand perhaps fueled a bit too much optimism.English can seem very well structured and it can tempt one to think of language in a very symbolic, within-the-box, rule-based way. In terms of syntax trees, sets of valid sentences etc, instead of "fuzzy probabilistic mess" that it really is. Surely, the syntax tree, generative grammar approach (Chomsky and others) gave us a lot of computer science, but this kind of "clean" and pure symbolic parsing doesn't seem to drive today's NLP progress.In summary, I wonder how linguistics and especially computational linguistics and NLP would have evolved in a non-Anglo culture, e.g. Slavic or Hungarian.

评论 #21580458 未加载

评论 #21580479 未加载

评论 #21583252 未加载

评论 #21580328 未加载

评论 #21579948 未加载

sansnomme超过 5 年前

Turkish is probably strict enough to be used as a programming language. The only downside is that its vocabulary is utterly alien for most speakers of Latin/Anglo-Saxon languages aside from some borrowed words from French and Arabic.

评论 #21579214 未加载

评论 #21579539 未加载

romwell超过 5 年前

Sumerian, an agglutinative language, is an important plot point in a famous cyberpunk novel, Snow Crash by Neal Stephenson[1] (which also popularized the word "avatar" as we use it today).If you find the concept interesting, you will enjoy reading the novel.[1]<a href="https://en.wikipedia.org/wiki/Snow_Crash" rel="nofollow">https://en.wikipedia.org/wiki/Snow_Crash</a>

Bootwizard超过 5 年前

Can someone here explain this in an easier to understand way? This was a bit too dense for my understanding...

评论 #21579560 未加载

评论 #21579529 未加载

评论 #21579474 未加载

评论 #21579748 未加载

beefman超过 5 年前

More broadly, synthetic languages are like statically-typed programming languages, whereas analytic languages[1] are like dynamically-typed programming languages.Also, intransitive verbs[2] are like thunks.[1] <a href="https://en.wikipedia.org/wiki/Analytic_language" rel="nofollow">https://en.wikipedia.org/wiki/Analytic_language</a>[2] <a href="https://en.wikipedia.org/wiki/Intransitive_verb" rel="nofollow">https://en.wikipedia.org/wiki/Intransitive_verb</a>

评论 #21580282 未加载

monkeycantype超过 5 年前

I was just reading this yesterday after the term came up in a Japanese grammar book.

foobar_超过 5 年前

Forth is probably the only agglutinative language in a way.

7 条评论

bonoboTP超过 5 年前

评论 #21580458 未加载

评论 #21580479 未加载

评论 #21583252 未加载

评论 #21580328 未加载

评论 #21579948 未加载

sansnomme超过 5 年前

评论 #21579214 未加载

评论 #21579539 未加载

romwell超过 5 年前

Bootwizard超过 5 年前

Can someone here explain this in an easier to understand way? This was a bit too dense for my understanding...

评论 #21579560 未加载

评论 #21579529 未加载

评论 #21579474 未加载

评论 #21579748 未加载

beefman超过 5 年前

评论 #21580282 未加载

monkeycantype超过 5 年前

I was just reading this yesterday after the term came up in a Japanese grammar book.

foobar_超过 5 年前

Forth is probably the only agglutinative language in a way.

Agglutinative Language

7 条评论

Agglutinative Language

7 条评论