What's the minimum number of words you'd need to define all other words? (2012)

343 pointsby devilciusabout 6 years ago

48 comments

Someoneabout 6 years ago

The Oxford Advanced Learner’s Dictionary has a “Defining vocabulary” that they claim is used to write almost all definitions (I used the Fifth edition, where it is appendix 10). It’s about 8½ pages, with 5 columns of about 63 lines, so about 2,700 words.It doesn’t list inflections, proper names, adjectives for colors such as yellowish and words used in an entry that derive from that entry (the dictionary mentions blearily and bleary-eyed being used in the definition of bleary)They also say they occassionally had to use a word not in the list, but don’t say how often they had to. Those words _are_ defined in the dictionary, so it is possible that the reference graph does not have any cycles.So, I guess 3,000 is a good first guess.

评论 #19334347 未加载

评论 #19334923 未加载

评论 #19333679 未加载

评论 #19334258 未加载

评论 #19332720 未加载

评论 #19333658 未加载

评论 #19334571 未加载

mjgeddesabout 6 years ago

The work of Ana Wierzbicka and Cliff Goddard studied 'Semantic Primes', 'the set of semantic concepts that are innately understood but cannot be expressed in simpler terms'.<a href="https://en.wikipedia.org/wiki/Semantic_primes" rel="nofollow">https://en.wikipedia.org/wiki/Semantic_primes</a>The combination of a set of semantic primes and the rules of combining them forms a 'Natural Semantic Metalanguage' , which is the core from which all the words in a given language would be built up.<a href="https://en.wikipedia.org/wiki/Natural_semantic_metalanguage" rel="nofollow">https://en.wikipedia.org/wiki/Natural_semantic_metalanguage</a>The current agreed-upon number of semantic primes is 65 (see list at wikipedia links above).That means that any English word can be defined using a lexicon of about 65 concepts in the English natural semantic metalanguage.

评论 #19333955 未加载

评论 #19336404 未加载

评论 #19335732 未加载

supericeabout 6 years ago

I am a little suprised that toki pona ("language of good", <a href="https://en.m.wikipedia.org/wiki/Toki_Pona" rel="nofollow">https://en.m.wikipedia.org/wiki/Toki_Pona</a>) is not mentioned. It is a language that consists of about 125 words, which aims to make you think about describing complicated subjects. To give an example: The concept "friend" could both be described as "good man" or "man good to me" depending on whether you think your friend is intrinsically good.Admittedly, the original question is specifically about the English language, but toki pona is a nice experiment related to this.

评论 #19332743 未加载

gojomoabout 6 years ago

An interesting related talk, touching on the minimality and expressiveness of both natural and computer languages, is Guy Steele's 1998 talk "Growing a Language":Video: <a href="https://www.youtube.com/watch?v=_ahvzDzKdB0" rel="nofollow">https://www.youtube.com/watch?v=_ahvzDzKdB0</a>PDF: <a href="https://www.cs.virginia.edu/~evans/cs655/readings/steele.pdf" rel="nofollow">https://www.cs.virginia.edu/~evans/cs655/readings/steele.pdf</a>Prior HN discussion: <a href="https://news.ycombinator.com/item?id=16847691" rel="nofollow">https://news.ycombinator.com/item?id=16847691</a>, <a href="https://news.ycombinator.com/item?id=2359174" rel="nofollow">https://news.ycombinator.com/item?id=2359174</a>, & others

评论 #19336211 未加载

评论 #19333301 未加载

fginionioabout 6 years ago

I think the approach I would use is as follows:0. Get a dictionary.1. Form a directed graph, with an edge from each word to every word that uses that word in its definition.2. Remove all words that have no outgoing edges.3. If you removed some words, go to step 1. Otherwise, all words left in the dictionary are minimal.EDIT: If anyone knows of a machine-readable dictionary, I'd love to actually do this.

评论 #19332621 未加载

评论 #19332635 未加载

评论 #19332133 未加载

评论 #19332420 未加载

评论 #19332502 未加载

visargaabout 6 years ago

Definitions are not enough to fully capture the meaning of a word. In order to do that you need full language modelling and to ground words into other sensory modalities, plus the word in relation to actions taken in various situations when the word was used.GPT-2 (of recent OpenAI fame) uses 1.5 billion parameters and, though capable of interesting results, is far from human level. It also uses just text so it's incomplete.<a href="https://blog.openai.com/better-language-models/" rel="nofollow">https://blog.openai.com/better-language-models/</a>Another interesting metric is Bits Per Character - BPC. The state of the art is around 1.06 on English Wikipedia. This measures the average achievable compression on character sequences and doesn't include the size of the model, just the size of the compressed sequence.<a href="https://arxiv.org/pdf/1808.04444.pdf" rel="nofollow">https://arxiv.org/pdf/1808.04444.pdf</a>

评论 #19336564 未加载

arooarooabout 6 years ago

I used to work for Pearson Longman, and one of their USPs was that their defining vocabulary was significantly smaller than the main competitors, namely OUP and CUP. Longman's was just over 2000 (about 2100 IIRC), whereas OUP's was approx 3000.Even then, one is rather constrained and definitions frequently cross-referenced other words to bootstrap the definition.

chasingabout 6 years ago

Words in the English language are not the same as computer code. I'm not sure you can fully define most words in terms of other words -- hence the variety. Dictionaries generally only provide rough sketches of the meaning of a word. Even synonyms can have slightly different subtexts, connotations, and histories. Hell, individual words have wildly different meanings depending on context.

评论 #19334141 未加载

abecedariusabout 6 years ago

Besides Basic English, I've run into a neat French dictionary for children, <a href="https://www.amazon.com/Mon-premier-dictionnaire-Roger-Pillet/dp/B0007DU07S" rel="nofollow">https://www.amazon.com/Mon-premier-dictionnaire-Roger-Pillet...</a>It sticks to a basic vocabulary, has an entry for every word it uses, and goes heavy on examples and pictures in preference to formal definitions. (And it's monolingual even though written mainly for learners in North America.)I don't have it to check, but estimating from memory: around 2000 to 4000 words. I found it useful while bootstrapping up from Duolingo.

评论 #19332429 未加载

评论 #19332862 未加载

评论 #19332263 未加载

YeGoblynQueenneabout 6 years ago

It depends on what is meant by "define". If we are allowed to use existing words in a language, L, to create a new language, L', then use expressions in L' to define each word in L, a single word w, originally in L, suffices.The idea is to first index each word v in the lexicon of L (including w), starting at 1 and ending at n, whatever is the number of distinct words in the language. Alternatively, you can index _meanings_. Then (should be obvious where I'm going with this by this point) you map a sequence S_k of repetitions of w of length k in [1,n] to each k'th word, v_k, in L. So now L' is the language of n sequences S_1,...,S_n of w each of which maps to a word (or meaning) in L. And you have "defined" L in terms of a single word, the word w.But that's probably not at all what the reddit poster had in mind.However, it should be noted that natural language is such that there's really no reason that we have many words- it's just convenient and helps us create new utterances without having to create long sequences of one word, as above. The important ability in human language is that we can combine words to create new utterances, forever- which we can do with one word just as well as with a few thousand.Finally, I suspect that if there was a minimal set of (more than one!) words sufficient to define all other words (meanings) in a language, all natural languages would converge to about that number of words- which I really don't think is the case.

评论 #19332784 未加载

评论 #19332441 未加载

Veedracabout 6 years ago

I once found (plausibly from another HN commenter) a text based adventure where (almost?) all the words used were replaced with alternative English-sounding nonsense words, but have never rediscovered the link.I feel this would be of interest to the thread, if anyone knows what I'm talking about or knows how to successfully Google for such a thing.

评论 #19332449 未加载

评论 #19332398 未加载

kybernetikosabout 6 years ago

I looked at this question a while back, and wrote this: <a href="https://kybernetikos.com/2007/12/03/atoms-of-english/" rel="nofollow">https://kybernetikos.com/2007/12/03/atoms-of-english/</a> (blog is only up some of the time sadly, I'll fix it eventually).I took Websters dictionary from the project Gutenberg site. I started with 95712 words. After the initial throwing away of words that weren’t in any definitions, I was down to 4489 words. After expanding them, and throwing away words that weren’t in the expanded definitions, I was down to 3601 words. Setting recursive definitions as atoms and continuing got me down to 2565 words.

feyman_rabout 6 years ago

Reminds me of Randall Munroe's Thing Explainer:"In Thing Explainer: Complicated Stuff in Simple Words, things are explained in the style of Up Goer Five, using only drawings and a vocabulary of the 1,000 (or "ten hundred") most common words."<a href="https://xkcd.com/thing-explainer/" rel="nofollow">https://xkcd.com/thing-explainer/</a>

评论 #19332450 未加载

评论 #19335716 未加载

评论 #19332071 未加载

aaron695about 6 years ago

I'd say most nouns need to be seen.To understand duck you must see a duck (Eat a duck, pet a duck, smell a duck, hear a duck)Perhaps you could cheat and uses pixels and coordinates to use English to draw photos and videos to explain ducks.

评论 #19332186 未加载

评论 #19332426 未加载

MrOxiMoronabout 6 years ago

this reminds me of <a href="https://youtu.be/_ahvzDzKdB0" rel="nofollow">https://youtu.be/_ahvzDzKdB0</a> awesome talk!

singularity2001about 6 years ago

Just one: "nor"<a href="https://en.wikipedia.org/wiki/Functional_completeness" rel="nofollow">https://en.wikipedia.org/wiki/Functional_completeness</a>Hope you are one of the 10000 lucky ones whose mind is blown for the first time.Or another one: "1"<a href="https://en.wikipedia.org/wiki/Unary_coding" rel="nofollow">https://en.wikipedia.org/wiki/Unary_coding</a>

评论 #19335809 未加载

hyperpalliumabout 6 years ago

The words needed to define a universal turing machine (and a program to simulate a human brain, but that doesn't require additional words).We could extend it to cover words not conceivable by humans, and any universe, by using a program to simulate those, but (1) I assume the question implicitly assumes human words, though (2) it wouldn't require more words anyway.

ggggtezabout 6 years ago

0 obviously. Babies start with no definitions of words, but here we all are.The baby learns the words via example, not by definitions.

评论 #19332629 未加载

WhitneyLandabout 6 years ago

How does this make any sense?You could have 100 synonyms with the same "definition" but 100 different shades of meaning, implied degree of strength, or connotations.You don't necessarily simplify anything by making people add additional words get across those subtleties.Of course of some are useless equivalents, but many aren't.

评论 #19332667 未加载

评论 #19337534 未加载

评论 #19332571 未加载

评论 #19332574 未加载

评论 #19332981 未加载

taternutsabout 6 years ago

Wow, I had no idea there was such a thing as simple.wikipedia.com! It apparently tries to follow 'Basic English'[0] that's comprised of only 850 words. The difference between the simple version[1] of artificial neural networks is a lot more approachable than the normal version[2]!0: <a href="https://simple.wikipedia.org/wiki/Basic_English" rel="nofollow">https://simple.wikipedia.org/wiki/Basic_English</a>1: <a href="https://simple.wikipedia.org/wiki/Artificial_neural_network" rel="nofollow">https://simple.wikipedia.org/wiki/Artificial_neural_network</a>2: <a href="https://en.wikipedia.org/wiki/Artificial_neural_network" rel="nofollow">https://en.wikipedia.org/wiki/Artificial_neural_network</a>

lostmsuabout 6 years ago

The answer is 2: zero and one. What you need is to describe second-order logic. Just define "every" quantifier to be 0 0 0, NAND [1] - 0 0 1, and all other words as other sequences of 0s and 1s, for clarity, that look like 1 *. There might need to be some trick to ensure unambiguity of splitting a "sentence" into "words", but that should be trivial.1: <a href="https://en.wikipedia.org/wiki/Sheffer_stroke" rel="nofollow">https://en.wikipedia.org/wiki/Sheffer_stroke</a>

emilfihlmanabout 6 years ago

Taken to the logical extreme, the question is: "how many intrinsic symbols do we need to convey any meaning when presented to a fully logical being", to which, in my opininon, the answer is 1 (or 2, really, since 1 is only "possible").You might not have words for it, but a fully logical being can decipher any bitstream given enough interactivity.So start from 1 and 0, form basis of mathematics and symbols, then start with physics from all the way bottom.

randartieabout 6 years ago

0) Initialize set X to contain every word.1) Y = set of words in every definition of the words in set X2) X = Y - X (all words in Y that are not in X)3) Repeat from 1 if the set of words in X has changedDoes that reduce all words down to the actual minimal set of words required to define other ones? Since you can build upwards from the resulting set X to get the original set of words.Also, this reminds me of the knapsack problem a little bit (for example what is the minimum set of coins required to be able to make $X).

rsyncabout 6 years ago

Isn't the answer "two" ? "One" and "None" (or on and off) ?Of course I see the obvious bootstrapping problem where you relate the encoding starting with just those two words but ... somehow I think that's easier to overcome than it seems ... as in, I think it must be possible.If Helen Keller can write a book, surely I can relate digital encoding to a toddler over the course of a year or three, right ?

评论 #19333029 未加载

评论 #19333027 未加载

评论 #19333320 未加载

bloakabout 6 years ago

I've seen a dictionary that defined 120 words using those same 120 words (morphemes really), though some of the definitions were a bit ... weak. Toki Pona also has about 120 words but it's a very different set of words: Toki Pona's vocabulary is concrete and everyday, while the dictionary's was very abstract. So probably it's just a cute coincidence that both numbers were about 120.

评论 #19336926 未加载

_cs2017_about 6 years ago

It depends on the context assumed about the audience that is supposed to understand the definitions.Do they have the experiences relevant to the word being defined? If not, what experiences do they have in common with the person providing the definition?How intelligent are they? Can they understand complex concepts through logic, through examples or both?How much do they know about English (besides the few words assumed known)?

lkrubnerabout 6 years ago

So, once the human race had discovered roughly this number of words (give or take a few for whatever language existed at the time, and minus the useless words demanded by grammar) then humans had a Turing Complete language? That must have been a crucial point for the evolution of human culture.

评论 #19333980 未加载

hitekkerabout 6 years ago

I am reminded of <a href="https://en.wikipedia.org/wiki/Natural_semantic_metalanguage" rel="nofollow">https://en.wikipedia.org/wiki/Natural_semantic_metalanguage</a>

DmitryOlshanskyabout 6 years ago

I bet this heavily depends on what you consider an accurate definition.

twotwotwoabout 6 years ago

There are lots of ways it's not at all the same, but it's at least sort of interesting to compare this question to the number of dimensions needed for effective word embeddings.

novalis78about 6 years ago

Reminds me of Toki Pona — with about 120 words it seems to work.

评论 #19333729 未加载

raldiabout 6 years ago

What's the minimum number of words you'd need to define the word "left", as in "left hand"?

评论 #19336683 未加载

gpmabout 6 years ago

a, b, c, d, e, f, g, j, k, l, m, n, p, r, s, v, x, y, (, ), and, concatenate.Some hints:- "backwards j"- "a circle"- "a cross"- "n, but rotated ninety degrees"- "mirror of p"- "vv, except no gap"- "pixel-wise union n and l"- "mirror of s, and make the lines straight"Semantics are impossible anyways, I challenge you to define the word "dog".Challenge: Do better, make sure you don't have circular dependencies.

aboutrubyabout 6 years ago

Can go from 1 word: Entity, to every word.The tradeoff being density of information, understandability to the readers, and conciseness.

评论 #19333713 未加载

SeanLukeabout 6 years ago

This feels intuitively like it's closely associated with some measure of the Komolgorov complexity of a passage.

doxosabout 6 years ago

Two words. "1" and "0"

评论 #19332401 未加载

评论 #19348737 未加载

评论 #19332118 未加载

sebringjabout 6 years ago

"a" and "i" since its binary you could define all others.

kazinatorabout 6 years ago

Good to see this silly question off r/lisp for once. :)

novalis78about 6 years ago

Reminds me of Toki Pona

keyleabout 6 years ago

I'm guessing but I can't really explain why, my gut feel is 42.

vonnikabout 6 years ago

Randall Munroe of XKCD experimented with this in his book Thing Explainer:<a href="https://xkcd.com/thing-explainer/" rel="nofollow">https://xkcd.com/thing-explainer/</a>

stretchwithmeabout 6 years ago

One.

评论 #19337479 未加载

agumonkeyabout 6 years ago

and kernel is one of them

Criper1Tookusabout 6 years ago

I've actually been wondering about this a lot myself recently, though I have been thinking of it in terms of "axiomatic English" i.e. the set of words and grammar/syntax rules from which all other meanings expressible in English can be represented, and cannot be explained themselves except through tautology? It's a really, really interesting question, and answering it would explain a lot about how we actually think.

Gunstig2Snathabout 6 years ago

I can actually answer this question. Back in the day I was going through Oxford Dictionary and it mentioned that all the meaning use words from a like of about 3,000 words. The list, IIRC, was also at the back of the dictionary. And it also mentioned that on rare occasions they have to use words outside of those 3,000.Source: My memory of something I read at British Council Library 17 years ago.

评论 #19337494 未加载

ChlorophZekabout 6 years ago

Finally, something thought-provoking! Everybody, ready your Internets, this gentleman deserves an answer!

WhuzzupDomalabout 6 years ago

Finally, something thought-provoking! Everybody, ready your Internets, this gentleman deserves an answer!

lutormabout 6 years ago

Doesn't Goedel's incompleteness theorems imply that it is impossible to define all words using words, unless you have some axiomatic words that are not defined within the system?

评论 #19332876 未加载

评论 #19332796 未加载

评论 #19332808 未加载

48 comments

Someoneabout 6 years ago

评论 #19334347 未加载

评论 #19334923 未加载

评论 #19333679 未加载

评论 #19334258 未加载

评论 #19332720 未加载

评论 #19333658 未加载

评论 #19334571 未加载

mjgeddesabout 6 years ago

评论 #19333955 未加载

评论 #19336404 未加载

评论 #19335732 未加载

supericeabout 6 years ago

评论 #19332743 未加载

gojomoabout 6 years ago

评论 #19336211 未加载

评论 #19333301 未加载

fginionioabout 6 years ago

评论 #19332621 未加载

评论 #19332635 未加载

评论 #19332133 未加载

评论 #19332420 未加载

评论 #19332502 未加载

visargaabout 6 years ago

评论 #19336564 未加载

arooarooabout 6 years ago

chasingabout 6 years ago

评论 #19334141 未加载

abecedariusabout 6 years ago

评论 #19332429 未加载

评论 #19332862 未加载

评论 #19332263 未加载

YeGoblynQueenneabout 6 years ago

评论 #19332784 未加载

评论 #19332441 未加载

Veedracabout 6 years ago

评论 #19332449 未加载

评论 #19332398 未加载

kybernetikosabout 6 years ago

feyman_rabout 6 years ago

评论 #19332450 未加载

评论 #19335716 未加载

评论 #19332071 未加载

aaron695about 6 years ago

评论 #19332186 未加载

评论 #19332426 未加载

MrOxiMoronabout 6 years ago

this reminds me of <a href="https://youtu.be/_ahvzDzKdB0" rel="nofollow">https://youtu.be/_ahvzDzKdB0</a> awesome talk!

singularity2001about 6 years ago

评论 #19335809 未加载

hyperpalliumabout 6 years ago

ggggtezabout 6 years ago

0 obviously. Babies start with no definitions of words, but here we all are.The baby learns the words via example, not by definitions.

评论 #19332629 未加载

WhitneyLandabout 6 years ago

评论 #19332667 未加载

评论 #19337534 未加载

评论 #19332571 未加载

评论 #19332574 未加载

评论 #19332981 未加载

taternutsabout 6 years ago

lostmsuabout 6 years ago

emilfihlmanabout 6 years ago

randartieabout 6 years ago

rsyncabout 6 years ago

评论 #19333029 未加载

评论 #19333027 未加载

评论 #19333320 未加载

bloakabout 6 years ago

评论 #19336926 未加载

_cs2017_about 6 years ago

lkrubnerabout 6 years ago

评论 #19333980 未加载

hitekkerabout 6 years ago

I am reminded of <a href="https://en.wikipedia.org/wiki/Natural_semantic_metalanguage" rel="nofollow">https://en.wikipedia.org/wiki/Natural_semantic_metalanguage</a>

DmitryOlshanskyabout 6 years ago

I bet this heavily depends on what you consider an accurate definition.

twotwotwoabout 6 years ago

There are lots of ways it's not at all the same, but it's at least sort of interesting to compare this question to the number of dimensions needed for effective word embeddings.

novalis78about 6 years ago

Reminds me of Toki Pona — with about 120 words it seems to work.

评论 #19333729 未加载

raldiabout 6 years ago

What's the minimum number of words you'd need to define the word "left", as in "left hand"?

评论 #19336683 未加载

gpmabout 6 years ago

aboutrubyabout 6 years ago

Can go from 1 word: Entity, to every word.The tradeoff being density of information, understandability to the readers, and conciseness.

评论 #19333713 未加载

SeanLukeabout 6 years ago

This feels intuitively like it's closely associated with some measure of the Komolgorov complexity of a passage.

doxosabout 6 years ago

Two words. "1" and "0"

评论 #19332401 未加载

评论 #19348737 未加载

评论 #19332118 未加载

sebringjabout 6 years ago

"a" and "i" since its binary you could define all others.

kazinatorabout 6 years ago

Good to see this silly question off r/lisp for once. :)

novalis78about 6 years ago

Reminds me of Toki Pona

keyleabout 6 years ago

I'm guessing but I can't really explain why, my gut feel is 42.

vonnikabout 6 years ago

Randall Munroe of XKCD experimented with this in his book Thing Explainer:<a href="https://xkcd.com/thing-explainer/" rel="nofollow">https://xkcd.com/thing-explainer/</a>

stretchwithmeabout 6 years ago

One.

评论 #19337479 未加载

agumonkeyabout 6 years ago

and kernel is one of them

Criper1Tookusabout 6 years ago

Gunstig2Snathabout 6 years ago

评论 #19337494 未加载

ChlorophZekabout 6 years ago

Finally, something thought-provoking! Everybody, ready your Internets, this gentleman deserves an answer!

WhuzzupDomalabout 6 years ago

Finally, something thought-provoking! Everybody, ready your Internets, this gentleman deserves an answer!

lutormabout 6 years ago

Doesn't Goedel's incompleteness theorems imply that it is impossible to define all words using words, unless you have some axiomatic words that are not defined within the system?

评论 #19332876 未加载

评论 #19332796 未加载

评论 #19332808 未加载