The problem is this is getting stuck on trying to find some spanning set of literal concepts which spans possible topics in natural language. It gets stuck at that first level. Just pick a restricted domain of conversation for the sake of doing the linguistically interesting part first, then go back and extend the domain until it is general.<p>He is on to something. You can implement something like LZ77 compression using the FOIL rules of ordinary parenthesis. Namely, you can use the distributive law to implement repeated string elimination. Instead of a back reference to the repeated string you have [repeated string]*(substring1 + substring2 + substring3...) where each substring is the insertion point. Apply recursively as needed. Inevitably, to compress optimally, you need enough tools to express any possible pattern as any possible pattern might be present. But if any possible pattern is reachable, oops you've got a Turing machine.<p>So my thesis statement here is the first layer up in natural language beyond the literals is the compression/entropy reduction/computation layer. This is the part of language which works pretty much like a programming language. The trick isn't just that it can express any literal thing, but that it can build expressions for any possible set of literal things in the domain. The layer after that is the social layer. The conversational protocol words, somewhat similar to HTTP request verbs but without implicitly assuming a client-server type relation pre-defined purpose to interaction. If you want to see how natural language ticks, skip the ground floor at first.<p>Here's my pet project which lead me to these ideas:
<a href="https://docs.google.com/document/d/1ws2IDqjENOBAQbOfK55XcuAl-0u0CgoDjL9OlUUFJCY" rel="nofollow">https://docs.google.com/document/d/1ws2IDqjENOBAQbOfK55XcuAl...</a>