TE
TechEcho
Home24h TopNewestBestAskShowJobs
GitHubTwitter
Home

TechEcho

A tech news platform built with Next.js, providing global tech news and discussions.

GitHubTwitter

Home

HomeNewestBestAskShowJobs

Resources

HackerNews APIOriginal HackerNewsNext.js

© 2025 TechEcho. All rights reserved.

Rappers, Sorted by Size of Vocabulary

600 pointsby sinnedabout 11 years ago

68 comments

losoabout 11 years ago
I enjoyed reading this chart but I hope it doesn't reinforce the bias that some fans have that word complexity is the only way to tell if a rapper is good or not. There are several ways to judge the strength and weaknesses of a rapper. Complexity is one of them, flow is another. Story telling ability is also another very strong in indicator. The best rappers are able to bring a mix while some are just so strong in one area that they explode no matter if they are really weak in other areas.
评论 #7696125 未加载
评论 #7696250 未加载
评论 #7696721 未加载
评论 #7697157 未加载
unfuncoabout 11 years ago
This is fascinating. I&#x27;m only a recent listener of hip-hop (primarily because of Earl Sweatshirt and Odd Future) and I&#x27;m in awe of the vernacular.<p>And similarly, as a boredom exercise a few weeks ago I did some lexical analysis of the song Timber (the monstrosity was being constantly played on the radio at the time) and here&#x27;s what I came out with:<p>&quot;83.1% of the words in the lyrics are five letters or less, 58.9% are four letters or less. The lexical density (the number of unique words divided by the total number of words, multiplied by one-hundred) is 29.1%. There is only one word in the song which has three or more syllables. Eleven people were involved with the writing of the song, each of them capable of producing just nine unique words each.&quot;
评论 #7694905 未加载
评论 #7697471 未加载
评论 #7697162 未加载
评论 #7694809 未加载
bretthopperabout 11 years ago
Looked for Canibus near the top and wasn&#x27;t surprised to find him 4th. If anyone hasn&#x27;t heard of him, highly suggest listening to his older stuff such as his first Can-I-Bus, 2000 BC and Mic Club.<p>He raps about science and space all the time which is cool.<p>Here&#x27;s an example of his ridiculous lyrics: <a href="http://rapgenius.com/Canibus-poet-laureate-infinity-lyrics" rel="nofollow">http:&#x2F;&#x2F;rapgenius.com&#x2F;Canibus-poet-laureate-infinity-lyrics</a>
评论 #7695168 未加载
seizethecheeseabout 11 years ago
Many here seem to be interpreting vocabulary size as a signal for quality. When it comes to rap I completely disagree. Firstly, the repetition is rap&#x27;s main ingredient. I read an article a while ago where researchers found that listening to a spoken phrase that is looped activates the same part of the brain as music, which helps explain this phenomenon.<p>Personally, if I want food for thought I read. Rap is not an intellectual pursuit. I&#x27;ve been perusing rappers on this list, and the top artists have not been good at all to my ears. It seems that the best rappers are in the middle, and being on either extreme is a negative signal.
评论 #7695990 未加载
评论 #7696215 未加载
评论 #7698294 未加载
评论 #7697166 未加载
评论 #7696552 未加载
Aardwolfabout 11 years ago
&gt; Shakespeare’s vocabulary: across his entire corpus, he uses 28,829 words, suggesting he knew over 100,000 words<p>Why does that suggest he knew over 100k words? Maybe it means he knew 28,829 and used all of them? Would he really know over 70,000 words he never used in his works? What would those 70,000 words be? Probably very obscure ones. How can you know that many obscure ones?
评论 #7694847 未加载
评论 #7695061 未加载
评论 #7695020 未加载
评论 #7694920 未加载
评论 #7694973 未加载
评论 #7694979 未加载
nmacabout 11 years ago
Its a nice touch including portmanteaus and &#x27;incorrect&#x27; ebonics on the list (like &quot;ery&#x27;day&quot;), since authors like shakespeare, joyce and others took the same liberties with language. Arguably, that&#x27;s how language develops and makes it interesting to study and think about. The OP could have easily stuck to words in the OED, kudos.
评论 #7697184 未加载
krickabout 11 years ago
Really interesting, but not as representative as it should be. It&#x27;s not clear why some have larger vocabulary than others. It could be using words like &quot;zeitgeist&quot; (in case of Aesop Rock) or some clever wordplay (I don&#x27;t know much about hip-hop, so I can&#x27;t find example for some artist from the list right off the bat, but I remember Marilyn Manson using word &quot;gloominati&quot; for instance) or pretty meaningless made up words like &quot;schizzle&quot; (in case of Snoop Dogg) or usual derivatives like &quot;fuckedy fuck&quot;. Moreover, in many transcripts for hip-hop people write down words as they are pronounced, which can be pretty much distorted for some artists (which of course ideally shouldn&#x27;t count as a &quot;new word&quot;, but that&#x27;s complicated, yeah).<p>While Aeson Rock and DMX are clearly extreme and not surprising at all, it&#x27;s not that clear for some guys in the middle.<p>So, first off, for every data project sources should be provided, or at least more specific definition, how text was processed, tokenized, analyzed. Second, several more &quot;data slices&quot; should be provided, for instance <i>100 most used words which are unique for that artist compared to other artist in the list</i>.
评论 #7696837 未加载
评论 #7697172 未加载
coherentponyabout 11 years ago
Maybe this is just me, but it&#x27;s a little unfair to compare to literary <i>texts</i>.<p>Humour me for a moment.<p>When an artist writes a song, he (or she) has constraints. Most rappers would like to rhyme the ends of their sentences. I know sometimes they don&#x27;t (like poetry), but it&#x27;s certainly pleasing to the ear to have that constraint. Artists endeavour to make their songs catchy, that&#x27;s highly correlated with the gross sales of the product.<p>When an artist writes a novel, this constraint is not weighted quite as highly. I know Shakespeare wrote poetry, too, and to call me out on this comparison is entirely fair. That said, there&#x27;s also an argument to be made for eye rhymes. Shakespeare used these a lot. Eye rhymes are words that don&#x27;t rhyme aurally, but <i>do</i> rhyme visually. It&#x27;s the story that pleases the reader, not necessarily its aural &#x27;catchiness&#x27;. I probably made that word up. But Shakespeare made words up too. The point is, you knew what I meant.<p>At the end of the day these comparisons, while certainly <i>interesting</i>, should be taken with a pinch of salt. While I&#x27;m at it, this advice can easily be extrapolated to any dataset. Always understand there may be unknown correlations.
评论 #7697183 未加载
thinkpad20about 11 years ago
Is Del tha Funkee Homosapien on this list? I&#x27;d be curious, since he has pretty non-standard lyrics.
评论 #7695711 未加载
评论 #7694869 未加载
habosaabout 11 years ago
Not surprised to see Wu Tang at the top and Drake at the bottom. Started from the bottom ... still there.
评论 #7696230 未加载
orblivionabout 11 years ago
This looks at the first so many lyrics in each rapper&#x27;s career. Aesop Rock came out with some weird stuff right off the bat. I wonder if some of these other rappers became more sophisticated over time. Maybe an average per song would be better, or average uniques per word, would be better.
评论 #7694677 未加载
评论 #7695902 未加载
randomdrakeabout 11 years ago
For those who aren&#x27;t familiar with Aesop Rock, I&#x27;d invite you to give him a listen sometime. His earlier albums, in particular, have been very influential to me in many ways. Both in my artistic and professional careers.<p>From comments on the conditions of the working man and the condition of feeling trapped in a &quot;j-o-b&quot;[1]:<p><pre><code> &quot;Now we the American working population Hate the fact that eight hours a day Is wasted on chasing the dream of someone that isn&#x27;t us And we may not hate our jobs But we hate jobs in general That don&#x27;t have to do with fighting our own causes We the American working population Hate the nine-to-five day-in day-out When we&#x27;d rather be supporting ourselves By being paid to perfect the pastimes That we have harbored based solely on the fact That it makes us smile if it sounds dope&quot; </code></pre> To storytelling masterpieces regarding living and dreaming[2]:<p><pre><code> &quot;Look, I&#x27;ve never had a dream in my life Because a dream is what you wanna do, but still haven&#x27;t pursued I knew what I wanted and did it till it was done So I&#x27;ve been the dream that I wanted to be since day one!&quot; </code></pre> Aesop Rock takes language and linguistics to entirely different levels than one might expect from the single genre that is hip-hop. He even challenges himself and the listeners, playing fantastic word games, for instance re-using the letters L, S, and D in odd and rhythmical ways after a mention[3]:<p><pre><code> &quot;Lazy summer days Like some decrepit landshark dumb luck squad dog lurks sicker deluded Last sturdy domino lean&#x27;s secluded Don&#x27;t let stupid delusions lesson super-duty labor students Dragnet lifer solutions Daddy loved sloppy dimensions like son-daughter links Such determinated lepers, successfully disheveled Little soliders developed like serpents despite life sentence ducking Lemmings Some don&#x27;t like sobriety&#x27;s dirty lenses Some do&quot; </code></pre> And then there are just incredible gems that stick with you like[4]:<p><pre><code> &quot;I don&#x27;t flick neeedles like my sick friend I don&#x27;t march like Beetle Bailey through a quick trend I don&#x27;t frequent church&#x27;s steeples on my weekend And I don&#x27;t comment if you formulate a weak Zen&quot; </code></pre> There&#x27;s a lot to explore from Aesop Rock. Should you find this type of hip-hop interesting, a decent place to start is with the label you can find these songs on, Definitive Jux[5]. Incredible talent has been on and off that label over the years. So much good stuff.<p>[1] - &quot;9-5ers Anthem&quot; - <a href="http://rapgenius.com/Aesop-rock-9-5ers-anthem-lyrics" rel="nofollow">http:&#x2F;&#x2F;rapgenius.com&#x2F;Aesop-rock-9-5ers-anthem-lyrics</a><p>[2] - &quot;No Regrets&quot; - <a href="http://rapgenius.com/Aesop-rock-no-regrets-lyrics" rel="nofollow">http:&#x2F;&#x2F;rapgenius.com&#x2F;Aesop-rock-no-regrets-lyrics</a><p>[3] - &quot;The Greatest Pac-Man Victory in History&quot; - <a href="http://rapgenius.com/Aesop-rock-the-greatest-pac-man-victory-in-history-lyrics" rel="nofollow">http:&#x2F;&#x2F;rapgenius.com&#x2F;Aesop-rock-the-greatest-pac-man-victory...</a><p>[4] - &quot;Save Yourself&quot; - <a href="http://rapgenius.com/Aesop-rock-save-yourself-lyrics" rel="nofollow">http:&#x2F;&#x2F;rapgenius.com&#x2F;Aesop-rock-save-yourself-lyrics</a><p>[5] - <a href="http://en.wikipedia.org/wiki/Definitive_Jux" rel="nofollow">http:&#x2F;&#x2F;en.wikipedia.org&#x2F;wiki&#x2F;Definitive_Jux</a>
评论 #7695026 未加载
评论 #7695252 未加载
评论 #7695452 未加载
评论 #7696701 未加载
评论 #7695976 未加载
评论 #7696332 未加载
Ryanmfabout 11 years ago
OP: Did your analysis of MF DOOM include his work alongside Madlib as Madvillian or his various other pseudonyms (King Geedorah, Viktor Vaughn, etc.)?<p>I find it a little hard to believe he&#x27;s not at least in the Wu Tang&#x2F;Canibus&#x2F;KK cluster, if not #1 overall.
评论 #7695596 未加载
评论 #7696635 未加载
评论 #7697351 未加载
评论 #7695834 未加载
quuxabout 11 years ago
I wonder where Weird Al Yankovic would come in on this ranking.
评论 #7694977 未加载
DigitalSeaabout 11 years ago
Makes me very happy to see Aesop Rock in the number #1 spot. He isn&#x27;t as underground as many people assume, still relatively unknown in the mainstream, but well known enough to sell records and sell-out shows. I wasn&#x27;t a big fan of his 2012 release Skelethon, but the way he structures his lyrics and the meaning behind them means he never writes a bad lyric.<p>Interestingly Eminem whom I would have thought would rank pretty highly for his clever method of word bending and enunciation is only in the middle of the scale. Still a whole lot better than some of his counterparts, but still surprising. Another interesting thing to note is Eminem being grouped in the same league as the likes of Jay-Z, Rakim and Lupe Fiasco. With only a couple of hundred unique words separating them from one another.
评论 #7697868 未加载
rigginsabout 11 years ago
I find it hilarious that DMX is dead last.<p>I&#x27;ve now got empirical evidence of what I always thought.<p>I think DMX rhymes words with themselves more than any rapper I&#x27;ve ever heard.
评论 #7694733 未加载
评论 #7695280 未加载
评论 #7694788 未加载
评论 #7695993 未加载
评论 #7694883 未加载
ballstothewallsabout 11 years ago
This is a great graph, but I think it would be neat if a y-axis was thrown in. My first thought was album sales or some other metric of popularity that help you find specific rappers quick instead of going through the huge bunch of little pics.
sareonabout 11 years ago
This reminds me of a PyCon talk from this year in analyzing rap lyrics with some basic NLP techniques<p><a href="http://pyvideo.org/video/2658/analyzing-rap-lyrics-with-python" rel="nofollow">http:&#x2F;&#x2F;pyvideo.org&#x2F;video&#x2F;2658&#x2F;analyzing-rap-lyrics-with-pyth...</a><p>The author was trying to see if rappers are considered more hateful towards women by their usage of &quot;bitch per song&quot;. The results are quite interesting.
zopticityabout 11 years ago
Lil Jon should be at the bottom with 7 words: &quot;Yeah!&quot;, &quot;Okay!&quot;, &quot;Shots!&quot; and &quot;Turn down for what?&quot;
评论 #7698633 未加载
rthomas6about 11 years ago
This infographic doesn&#x27;t take into account other rappers possibly copying earlier really influential artists, making the earlier influential artists rank lower. More generally, it would be cool to see this chart ranked by the amount of original words present in the first 35,000 lyrics <i>that were not present yet at the albums&#x27; time of publication</i>.
ryan1234567890about 11 years ago
To put some perspective on this: ryan@3G08:~&#x2F;Desktop&#x2F;bleh$ pdftotext David-Foster-Wallace-Infinite-Jest-v2.0.pdf ryan@3G08:~&#x2F;Desktop&#x2F;bleh$ python dfw.py size of vocabulary: 30725<p>The man passed Shakespeare by 1,896 words with that book.<p>code:<p><pre><code> import nltk from nltk.stem import * import string raw = open(&quot;&#x2F;home&#x2F;ryan&#x2F;Desktop&#x2F;bleh&#x2F;David-Foster-Wallace-Infinite-Jest-v2.0.txt&quot;,&#x27;rU&#x27;).read() exclude = set(string.punctuation) raw = &#x27;&#x27;.join(ch for ch in raw if ch not in exclude) raw = raw.lower() tokens=nltk.word_tokenize(raw) stemmer = PorterStemmer() stemmed_tokens = set() for token in tokens: stemmed_tokens.add(stemmer.stem(token)) print &quot;size of vocabulary:&quot;, len(set(stemmed_tokens))</code></pre>
Tychoabout 11 years ago
I&#x27;ve been wanting to do some NLP on rap genius&#x27;s corpus for ages. This is a great analysis. What I had thought of is write a program to detect ghostwriting. Rappers probably have some sort of lyrical &#x27;DNA&#x27; in the construction of their verses. How often they use certain words, number of words per line, number of unique words per song, ratio of adjectives to nouns, that kind of thing. You could probably unmask some ghost-writing secrets.<p>Looking at the analysis here, it&#x27;s interesting to see some clustering in the results. IMO the second cluster is the sweet spot: Wu Tang&#x27;s excessive invention of vocabulary is cool but probably detracts from the poetic effect. Meanwhile rappers like 2Pac are just kind of boring IMO, at least going by their lyrics alone.
dmouratiabout 11 years ago
I&#x27;m a big fan of the project and the way it is presented. Not sure why Wu-Tang features so prominently but I guess I&#x27;m okay with that. Kool Keith should be broken down further into his constituent parts. I also would have thought the Beastie Boys would have run higher.
评论 #7695648 未加载
评论 #7697188 未加载
andybakabout 11 years ago
I would have been rather surprised not to see Aesop Rock fairly high up the list. I was reading the Rap Genius pages for a few of his tracks the other week and the sheer density of wordplay was fairly overwhelming.<p>It is rap for geeks though ;)
danielsfabout 11 years ago
author here: hit me up with questions you&#x27;ve got.
评论 #7696090 未加载
评论 #7695719 未加载
评论 #7695340 未加载
评论 #7695680 未加载
评论 #7698967 未加载
评论 #7694957 未加载
Aqueousabout 11 years ago
Greatly enjoyed the analysis but while I was reading it I felt a lot like this guy:<p><a href="https://www.youtube.com/watch?v=GKlDBi0cyIA" rel="nofollow">https:&#x2F;&#x2F;www.youtube.com&#x2F;watch?v=GKlDBi0cyIA</a>
NAFV_Pabout 11 years ago
All the rappers listed seem to be American.<p>Whack this through your Bowers and Wilkins:<p><a href="https://www.youtube.com/watch?v=p_SQEUZomug" rel="nofollow">https:&#x2F;&#x2F;www.youtube.com&#x2F;watch?v=p_SQEUZomug</a>
S_A_Pabout 11 years ago
I think the only problem I see is that some rap groups are listed as rappers. For instance beastie boys, de la soul and wu tang are listed. So there is some collective vocabulary being compared to single rappers. That said this is cool and pretty telling. From what I could see it is probably loosely couple to the intelligence of the rappers listed. I will echo the sentiments about DMX here. Looks like some shock jock rappers definitely are low on the list (too short).
评论 #7695002 未加载
kenjacksonabout 11 years ago
This is an interesting analysis.<p>I love the fact that E-40 is about on par with Shakespeare. I&#x27;m sure he would take it as a compliment to be called the modern day Shakespeare.
评论 #7694891 未加载
htkabout 11 years ago
&quot;Each word is counted once, so pimps, pimp, pimping, and pimpin are four unique words&quot;<p>So much for the modern Shakespeares on the list.
评论 #7694741 未加载
评论 #7694674 未加载
koala_advertabout 11 years ago
I keep getting this error, in Firefox and Chrome:<p>&lt;Error&gt; &lt;Code&gt;AccessDenied&lt;&#x2F;Code&gt; &lt;Message&gt;Access Denied&lt;&#x2F;Message&gt; &lt;RequestId&gt;3CB1F41D7DFDC794&lt;&#x2F;RequestId&gt; &lt;HostId&gt; wHCPzEYPDsmkMJX+YIgjU40YPrGYytHrk5B44dApi7663NkQQI0RKx9A&#x2F;6EX7Iph &lt;&#x2F;HostId&gt; &lt;&#x2F;Error&gt;
评论 #7707803 未加载
评论 #7697174 未加载
dnauticsabout 11 years ago
How about a 2d visualization with a sliding 10000 word window, with the y axis as unique words out of 10k and the x aaxis time. Are there cultural trends that are time dependent? Did young mc and Del use more words than contemporary artists? Did their trends as artists follow the global trend over time?
selimthegrimabout 11 years ago
Maybe this will help me answer that nagging question at the back of my brain: What does DJ Khaled actually _do_?
Grue3about 11 years ago
Would be interesting to see how they compare to rock bands like Titus Andronicus, Fucked Up or Bad Religion.
Totientabout 11 years ago
I wonder where things like classic rock &#x2F; broadway musicals &#x2F; opera &#x2F; etc. fits on this spectrum.<p>I really appreciate including Shakespeare and Moby Dick on the spectrum, but I&#x27;d still like some more perspective. For that matter, I wonder how many unique words <i>I</i> use every day.
tokipinabout 11 years ago
Just a note, those artists don&#x27;t necessarily use all their vocabulary. Eminem for example clearly holds back on his vocabulary. Rap is as much an art as anything can be so there are all sorts of factors. Be careful what you might want to draw here other than curiousity.
评论 #7695169 未加载
sbierwagenabout 11 years ago
Cool to see Canibus so high in the rankings.<p>It&#x27;d also be cool to add the members of AOTP to the analysis.
msutherlabout 11 years ago
I would love to see this analysis without filters. Who is <i>the</i> rapper with the largest vocabulary? What does the distribution look like at the top? Surely Antipop Consortium or MF DOOM have larger vocabularies than Aesop for instance.
评论 #7695701 未加载
gfodyabout 11 years ago
I&#x27;m pretty sure E-40 scored so high because of all the made up words. He&#x27;s highly regarded for being innovative and influential but you know for every piece of slang that stuck there&#x27;s like ten that didn&#x27;t.
ff10about 11 years ago
Really surprised MF Doom is not ranked higher – are his side projects included?
oakazabout 11 years ago
Why Jedi Mind Tricks is not counted? He&#x27;d be the first in this list; <a href="https://www.youtube.com/watch?v=TlZgiK6FiO0" rel="nofollow">https:&#x2F;&#x2F;www.youtube.com&#x2F;watch?v=TlZgiK6FiO0</a>
评论 #7695673 未加载
Mikeb85about 11 years ago
Not particularly surprised at the list. Aesop Rock, the whole Wu-tang Clan, and guys like Nas, Wale, all near the top. DMX and Too Short at the bottom...<p>Definitely comes out in their music...
评论 #7696124 未加载
jarnixabout 11 years ago
How many words in &quot;fo shizzle ma nizzle&quot; ? 4 or 0 ?
评论 #7697195 未加载
camus2about 11 years ago
I would love the same chart but sorted by vulgarity.
ignacioelolaabout 11 years ago
I would love to see the same analysis across different music styles. How compare vocabulary size of Madonna, Bob Dylan and Justin Bieber?
评论 #7698309 未加载
评论 #7697200 未加载
bladecatcherabout 11 years ago
I would like to see Dälek included in the study. I&#x27;d be surprised if they didn&#x27;t show up on the far right on the scale.
konceptzabout 11 years ago
What I would like to see, is this same comparison done against album sales with the implication of mainstream vs. underground.
评论 #7697202 未加载
b3b0pabout 11 years ago
Was it mentioned where the data was sourced from? I&#x27;m not seeing anything and I went back and checked. Did I miss it?
评论 #7696040 未加载
jomtungabout 11 years ago
Killah Priest should be grouped with Wu-Tang.
评论 #7697198 未加载
zeppelinnnabout 11 years ago
This is awesome. Reminds me of all the data viz they are doing on rapgenius. You forgot Atmosphere though (Slug)
m_muellerabout 11 years ago
I&#x27;d be interested in how Nerdcore rappers compare to this, such as MC Frontalot or Professor Elemental.
评论 #7696015 未加载
devindotcomabout 11 years ago
Couldn&#x27;t find Aceyalone - I thought he&#x27;d be in the top 10, I guess he wasn&#x27;t included.
评论 #7696764 未加载
评论 #7700202 未加载
thegasmanabout 11 years ago
No mention of MF Doom? Metalface? Doom? Victor Vaughan? (All the same gentleman from LA)
评论 #7696226 未加载
评论 #7696021 未加载
tps12about 11 years ago
So funny comparing this to the same graph they did for pop lyricists.
snarfyabout 11 years ago
I wasn&#x27;t surprised to see Canibus and Outkast up there.
dnlserranoabout 11 years ago
Awesome. This guy should definitely work for RapGenius.com.
评论 #7694890 未加载
shaggyfrogabout 11 years ago
Incredibly, a list about rapper vocabulary is missing anyone associated with nerdcore.<p>I&#x27;m interested to see where the likes of MC Frontalot, Wordburglar, YTCracker, etc. rank on that scale...
评论 #7694860 未加载
评论 #7695125 未加载
评论 #7694952 未加载
jmt7lesabout 11 years ago
I&#x27;d love to see Immortal Technique also.
prg318about 11 years ago
Thank you based god!!
1risabout 11 years ago
Shouldn&#x27;t that be adjusted to the size of the text corpus?
评论 #7695004 未加载
moron4hireabout 11 years ago
This might be the best-made infographic I&#x27;ve ever seen.
评论 #7694892 未加载
pinkskipabout 11 years ago
Woah so awesome!
allan_about 11 years ago
where is KRS-ONE?
评论 #7695523 未加载
benihanaabout 11 years ago
I&#x27;d really like to see this broken down by established vocabulary and made up vocabulary. I think that would really start to show who were the best lyricists on both ends. Rappers with a lot of made up words might be on the far left, and rappers with a lot of unique words that aren&#x27;t made up would be on the far right. Both sides of the scale would show rapping talent on different dimensions. Influential rappers like E-40 who add new words to the vocabulary, and wordy rappers like Aes on the right who use a really dense and descriptive vocabulary.
评论 #7695014 未加载
评论 #7697152 未加载
评论 #7696322 未加载
评论 #7695024 未加载
skylan_qabout 11 years ago
Kool Keith should be exempt from this list. He&#x27;s not from any of the 4 regions listed, but from Jupiter.
评论 #7694909 未加载
评论 #7695349 未加载
评论 #7696619 未加载
评论 #7694704 未加载
评论 #7694876 未加载
评论 #7696900 未加载
thrownaway2424about 11 years ago
Gotta wonder about the garbage-in factor of Rap Genius. From one randomly selected Aesop Rock cut:<p>&quot;Please I want to donate my brain to the monstrous Panasonic profit&quot;<p>I guess it could be. I always heard it as &quot;monstrous Panasonic prophet.&quot; It would be in keeping with the previous lyric &quot;Television, all hail grand pixelated god of fantasy.&quot;
评论 #7695550 未加载
评论 #7695338 未加载
评论 #7695906 未加载
评论 #7697159 未加载
sarrephabout 11 years ago
We might all be self-confessed <i>hackers</i>, but we&#x27;ll never explicitly confess our adoration for the gloriousness of the genre that is <i>gangster rap</i>.
评论 #7695040 未加载
评论 #7694993 未加载
simonsterabout 11 years ago
The estimate of vocabulary size here is based on the number of unique words used. This seems like it is strongly biased: if two artists have the same size vocabulary, but one has released more albums and thus used more words, that artist will probably have used more unique words. To underscore this point, the number of unique words used by Aesop Rock is half of the estimated vocabulary size of the average college student, although to be fair that estimate is the number of words that an individual can recognize, not the number of words they use. (Edit: the bias is somewhat mitigated by the fact that the same number of words is used to estimate the vocabulary for each artist, but the bias is not dependent on sample size alone but also upon the size of the artist&#x27;s underlying vocabulary; see my comments below.)<p>The underlying problem is one of estimating the cardinality of a multinomial distribution given a fixed number of samples. In isolation this problem is ill-posed, since it is always possible that there is a word in a given lyricist&#x27;s vocabulary that he uses with very low frequency and that is unlikely to appear in any sample, but with appropriate prior information it may be possible to obtain an accurate estimate.<p>This is not my field, but a brief Google Scholar search shows that there are several papers on estimating vocabulary size, or equivalently, estimating the number of species based on sampling. There is a somewhat dated review (<a href="http://cvcl.mit.edu/SUNSeminar/BungeFitzpatrick_1993.pdf" rel="nofollow">http:&#x2F;&#x2F;cvcl.mit.edu&#x2F;SUNSeminar&#x2F;BungeFitzpatrick_1993.pdf</a>) that details some methods of estimation (in this case, I believe we are in the domain of &quot;infinite population, multinomial sample&quot; with unequal class sizes). The paper notes that there is no unbiased estimator available without assumptions on the distribution of word use frequencies, but some of the proposed estimators may be more accurate than the naive estimate used here.
评论 #7695288 未加载
评论 #7695279 未加载
评论 #7695339 未加载
评论 #7695301 未加载
评论 #7695292 未加载