TE
科技回声
首页24小时热榜最新最佳问答展示工作
GitHubTwitter
首页

科技回声

基于 Next.js 构建的科技新闻平台,提供全球科技新闻和讨论内容。

GitHubTwitter

首页

首页最新最佳问答展示工作

资源链接

HackerNews API原版 HackerNewsNext.js

© 2025 科技回声. 版权所有。

Rappers, Sorted by Size of Vocabulary

600 点作者 sinned大约 11 年前

68 条评论

loso大约 11 年前
I enjoyed reading this chart but I hope it doesn't reinforce the bias that some fans have that word complexity is the only way to tell if a rapper is good or not. There are several ways to judge the strength and weaknesses of a rapper. Complexity is one of them, flow is another. Story telling ability is also another very strong in indicator. The best rappers are able to bring a mix while some are just so strong in one area that they explode no matter if they are really weak in other areas.
评论 #7696125 未加载
评论 #7696250 未加载
评论 #7696721 未加载
评论 #7697157 未加载
unfunco大约 11 年前
This is fascinating. I&#x27;m only a recent listener of hip-hop (primarily because of Earl Sweatshirt and Odd Future) and I&#x27;m in awe of the vernacular.<p>And similarly, as a boredom exercise a few weeks ago I did some lexical analysis of the song Timber (the monstrosity was being constantly played on the radio at the time) and here&#x27;s what I came out with:<p>&quot;83.1% of the words in the lyrics are five letters or less, 58.9% are four letters or less. The lexical density (the number of unique words divided by the total number of words, multiplied by one-hundred) is 29.1%. There is only one word in the song which has three or more syllables. Eleven people were involved with the writing of the song, each of them capable of producing just nine unique words each.&quot;
评论 #7694905 未加载
评论 #7697471 未加载
评论 #7697162 未加载
评论 #7694809 未加载
bretthopper大约 11 年前
Looked for Canibus near the top and wasn&#x27;t surprised to find him 4th. If anyone hasn&#x27;t heard of him, highly suggest listening to his older stuff such as his first Can-I-Bus, 2000 BC and Mic Club.<p>He raps about science and space all the time which is cool.<p>Here&#x27;s an example of his ridiculous lyrics: <a href="http://rapgenius.com/Canibus-poet-laureate-infinity-lyrics" rel="nofollow">http:&#x2F;&#x2F;rapgenius.com&#x2F;Canibus-poet-laureate-infinity-lyrics</a>
评论 #7695168 未加载
seizethecheese大约 11 年前
Many here seem to be interpreting vocabulary size as a signal for quality. When it comes to rap I completely disagree. Firstly, the repetition is rap&#x27;s main ingredient. I read an article a while ago where researchers found that listening to a spoken phrase that is looped activates the same part of the brain as music, which helps explain this phenomenon.<p>Personally, if I want food for thought I read. Rap is not an intellectual pursuit. I&#x27;ve been perusing rappers on this list, and the top artists have not been good at all to my ears. It seems that the best rappers are in the middle, and being on either extreme is a negative signal.
评论 #7695990 未加载
评论 #7696215 未加载
评论 #7698294 未加载
评论 #7697166 未加载
评论 #7696552 未加载
Aardwolf大约 11 年前
&gt; Shakespeare’s vocabulary: across his entire corpus, he uses 28,829 words, suggesting he knew over 100,000 words<p>Why does that suggest he knew over 100k words? Maybe it means he knew 28,829 and used all of them? Would he really know over 70,000 words he never used in his works? What would those 70,000 words be? Probably very obscure ones. How can you know that many obscure ones?
评论 #7694847 未加载
评论 #7695061 未加载
评论 #7695020 未加载
评论 #7694920 未加载
评论 #7694973 未加载
评论 #7694979 未加载
nmac大约 11 年前
Its a nice touch including portmanteaus and &#x27;incorrect&#x27; ebonics on the list (like &quot;ery&#x27;day&quot;), since authors like shakespeare, joyce and others took the same liberties with language. Arguably, that&#x27;s how language develops and makes it interesting to study and think about. The OP could have easily stuck to words in the OED, kudos.
评论 #7697184 未加载
krick大约 11 年前
Really interesting, but not as representative as it should be. It&#x27;s not clear why some have larger vocabulary than others. It could be using words like &quot;zeitgeist&quot; (in case of Aesop Rock) or some clever wordplay (I don&#x27;t know much about hip-hop, so I can&#x27;t find example for some artist from the list right off the bat, but I remember Marilyn Manson using word &quot;gloominati&quot; for instance) or pretty meaningless made up words like &quot;schizzle&quot; (in case of Snoop Dogg) or usual derivatives like &quot;fuckedy fuck&quot;. Moreover, in many transcripts for hip-hop people write down words as they are pronounced, which can be pretty much distorted for some artists (which of course ideally shouldn&#x27;t count as a &quot;new word&quot;, but that&#x27;s complicated, yeah).<p>While Aeson Rock and DMX are clearly extreme and not surprising at all, it&#x27;s not that clear for some guys in the middle.<p>So, first off, for every data project sources should be provided, or at least more specific definition, how text was processed, tokenized, analyzed. Second, several more &quot;data slices&quot; should be provided, for instance <i>100 most used words which are unique for that artist compared to other artist in the list</i>.
评论 #7696837 未加载
评论 #7697172 未加载
coherentpony大约 11 年前
Maybe this is just me, but it&#x27;s a little unfair to compare to literary <i>texts</i>.<p>Humour me for a moment.<p>When an artist writes a song, he (or she) has constraints. Most rappers would like to rhyme the ends of their sentences. I know sometimes they don&#x27;t (like poetry), but it&#x27;s certainly pleasing to the ear to have that constraint. Artists endeavour to make their songs catchy, that&#x27;s highly correlated with the gross sales of the product.<p>When an artist writes a novel, this constraint is not weighted quite as highly. I know Shakespeare wrote poetry, too, and to call me out on this comparison is entirely fair. That said, there&#x27;s also an argument to be made for eye rhymes. Shakespeare used these a lot. Eye rhymes are words that don&#x27;t rhyme aurally, but <i>do</i> rhyme visually. It&#x27;s the story that pleases the reader, not necessarily its aural &#x27;catchiness&#x27;. I probably made that word up. But Shakespeare made words up too. The point is, you knew what I meant.<p>At the end of the day these comparisons, while certainly <i>interesting</i>, should be taken with a pinch of salt. While I&#x27;m at it, this advice can easily be extrapolated to any dataset. Always understand there may be unknown correlations.
评论 #7697183 未加载
thinkpad20大约 11 年前
Is Del tha Funkee Homosapien on this list? I&#x27;d be curious, since he has pretty non-standard lyrics.
评论 #7695711 未加载
评论 #7694869 未加载
habosa大约 11 年前
Not surprised to see Wu Tang at the top and Drake at the bottom. Started from the bottom ... still there.
评论 #7696230 未加载
orblivion大约 11 年前
This looks at the first so many lyrics in each rapper&#x27;s career. Aesop Rock came out with some weird stuff right off the bat. I wonder if some of these other rappers became more sophisticated over time. Maybe an average per song would be better, or average uniques per word, would be better.
评论 #7694677 未加载
评论 #7695902 未加载
randomdrake大约 11 年前
For those who aren&#x27;t familiar with Aesop Rock, I&#x27;d invite you to give him a listen sometime. His earlier albums, in particular, have been very influential to me in many ways. Both in my artistic and professional careers.<p>From comments on the conditions of the working man and the condition of feeling trapped in a &quot;j-o-b&quot;[1]:<p><pre><code> &quot;Now we the American working population Hate the fact that eight hours a day Is wasted on chasing the dream of someone that isn&#x27;t us And we may not hate our jobs But we hate jobs in general That don&#x27;t have to do with fighting our own causes We the American working population Hate the nine-to-five day-in day-out When we&#x27;d rather be supporting ourselves By being paid to perfect the pastimes That we have harbored based solely on the fact That it makes us smile if it sounds dope&quot; </code></pre> To storytelling masterpieces regarding living and dreaming[2]:<p><pre><code> &quot;Look, I&#x27;ve never had a dream in my life Because a dream is what you wanna do, but still haven&#x27;t pursued I knew what I wanted and did it till it was done So I&#x27;ve been the dream that I wanted to be since day one!&quot; </code></pre> Aesop Rock takes language and linguistics to entirely different levels than one might expect from the single genre that is hip-hop. He even challenges himself and the listeners, playing fantastic word games, for instance re-using the letters L, S, and D in odd and rhythmical ways after a mention[3]:<p><pre><code> &quot;Lazy summer days Like some decrepit landshark dumb luck squad dog lurks sicker deluded Last sturdy domino lean&#x27;s secluded Don&#x27;t let stupid delusions lesson super-duty labor students Dragnet lifer solutions Daddy loved sloppy dimensions like son-daughter links Such determinated lepers, successfully disheveled Little soliders developed like serpents despite life sentence ducking Lemmings Some don&#x27;t like sobriety&#x27;s dirty lenses Some do&quot; </code></pre> And then there are just incredible gems that stick with you like[4]:<p><pre><code> &quot;I don&#x27;t flick neeedles like my sick friend I don&#x27;t march like Beetle Bailey through a quick trend I don&#x27;t frequent church&#x27;s steeples on my weekend And I don&#x27;t comment if you formulate a weak Zen&quot; </code></pre> There&#x27;s a lot to explore from Aesop Rock. Should you find this type of hip-hop interesting, a decent place to start is with the label you can find these songs on, Definitive Jux[5]. Incredible talent has been on and off that label over the years. So much good stuff.<p>[1] - &quot;9-5ers Anthem&quot; - <a href="http://rapgenius.com/Aesop-rock-9-5ers-anthem-lyrics" rel="nofollow">http:&#x2F;&#x2F;rapgenius.com&#x2F;Aesop-rock-9-5ers-anthem-lyrics</a><p>[2] - &quot;No Regrets&quot; - <a href="http://rapgenius.com/Aesop-rock-no-regrets-lyrics" rel="nofollow">http:&#x2F;&#x2F;rapgenius.com&#x2F;Aesop-rock-no-regrets-lyrics</a><p>[3] - &quot;The Greatest Pac-Man Victory in History&quot; - <a href="http://rapgenius.com/Aesop-rock-the-greatest-pac-man-victory-in-history-lyrics" rel="nofollow">http:&#x2F;&#x2F;rapgenius.com&#x2F;Aesop-rock-the-greatest-pac-man-victory...</a><p>[4] - &quot;Save Yourself&quot; - <a href="http://rapgenius.com/Aesop-rock-save-yourself-lyrics" rel="nofollow">http:&#x2F;&#x2F;rapgenius.com&#x2F;Aesop-rock-save-yourself-lyrics</a><p>[5] - <a href="http://en.wikipedia.org/wiki/Definitive_Jux" rel="nofollow">http:&#x2F;&#x2F;en.wikipedia.org&#x2F;wiki&#x2F;Definitive_Jux</a>
评论 #7695026 未加载
评论 #7695252 未加载
评论 #7695452 未加载
评论 #7696701 未加载
评论 #7695976 未加载
评论 #7696332 未加载
Ryanmf大约 11 年前
OP: Did your analysis of MF DOOM include his work alongside Madlib as Madvillian or his various other pseudonyms (King Geedorah, Viktor Vaughn, etc.)?<p>I find it a little hard to believe he&#x27;s not at least in the Wu Tang&#x2F;Canibus&#x2F;KK cluster, if not #1 overall.
评论 #7695596 未加载
评论 #7696635 未加载
评论 #7697351 未加载
评论 #7695834 未加载
quux大约 11 年前
I wonder where Weird Al Yankovic would come in on this ranking.
评论 #7694977 未加载
DigitalSea大约 11 年前
Makes me very happy to see Aesop Rock in the number #1 spot. He isn&#x27;t as underground as many people assume, still relatively unknown in the mainstream, but well known enough to sell records and sell-out shows. I wasn&#x27;t a big fan of his 2012 release Skelethon, but the way he structures his lyrics and the meaning behind them means he never writes a bad lyric.<p>Interestingly Eminem whom I would have thought would rank pretty highly for his clever method of word bending and enunciation is only in the middle of the scale. Still a whole lot better than some of his counterparts, but still surprising. Another interesting thing to note is Eminem being grouped in the same league as the likes of Jay-Z, Rakim and Lupe Fiasco. With only a couple of hundred unique words separating them from one another.
评论 #7697868 未加载
riggins大约 11 年前
I find it hilarious that DMX is dead last.<p>I&#x27;ve now got empirical evidence of what I always thought.<p>I think DMX rhymes words with themselves more than any rapper I&#x27;ve ever heard.
评论 #7694733 未加载
评论 #7695280 未加载
评论 #7694788 未加载
评论 #7695993 未加载
评论 #7694883 未加载
ballstothewalls大约 11 年前
This is a great graph, but I think it would be neat if a y-axis was thrown in. My first thought was album sales or some other metric of popularity that help you find specific rappers quick instead of going through the huge bunch of little pics.
sareon大约 11 年前
This reminds me of a PyCon talk from this year in analyzing rap lyrics with some basic NLP techniques<p><a href="http://pyvideo.org/video/2658/analyzing-rap-lyrics-with-python" rel="nofollow">http:&#x2F;&#x2F;pyvideo.org&#x2F;video&#x2F;2658&#x2F;analyzing-rap-lyrics-with-pyth...</a><p>The author was trying to see if rappers are considered more hateful towards women by their usage of &quot;bitch per song&quot;. The results are quite interesting.
zopticity大约 11 年前
Lil Jon should be at the bottom with 7 words: &quot;Yeah!&quot;, &quot;Okay!&quot;, &quot;Shots!&quot; and &quot;Turn down for what?&quot;
评论 #7698633 未加载
rthomas6大约 11 年前
This infographic doesn&#x27;t take into account other rappers possibly copying earlier really influential artists, making the earlier influential artists rank lower. More generally, it would be cool to see this chart ranked by the amount of original words present in the first 35,000 lyrics <i>that were not present yet at the albums&#x27; time of publication</i>.
ryan1234567890大约 11 年前
To put some perspective on this: ryan@3G08:~&#x2F;Desktop&#x2F;bleh$ pdftotext David-Foster-Wallace-Infinite-Jest-v2.0.pdf ryan@3G08:~&#x2F;Desktop&#x2F;bleh$ python dfw.py size of vocabulary: 30725<p>The man passed Shakespeare by 1,896 words with that book.<p>code:<p><pre><code> import nltk from nltk.stem import * import string raw = open(&quot;&#x2F;home&#x2F;ryan&#x2F;Desktop&#x2F;bleh&#x2F;David-Foster-Wallace-Infinite-Jest-v2.0.txt&quot;,&#x27;rU&#x27;).read() exclude = set(string.punctuation) raw = &#x27;&#x27;.join(ch for ch in raw if ch not in exclude) raw = raw.lower() tokens=nltk.word_tokenize(raw) stemmer = PorterStemmer() stemmed_tokens = set() for token in tokens: stemmed_tokens.add(stemmer.stem(token)) print &quot;size of vocabulary:&quot;, len(set(stemmed_tokens))</code></pre>
Tycho大约 11 年前
I&#x27;ve been wanting to do some NLP on rap genius&#x27;s corpus for ages. This is a great analysis. What I had thought of is write a program to detect ghostwriting. Rappers probably have some sort of lyrical &#x27;DNA&#x27; in the construction of their verses. How often they use certain words, number of words per line, number of unique words per song, ratio of adjectives to nouns, that kind of thing. You could probably unmask some ghost-writing secrets.<p>Looking at the analysis here, it&#x27;s interesting to see some clustering in the results. IMO the second cluster is the sweet spot: Wu Tang&#x27;s excessive invention of vocabulary is cool but probably detracts from the poetic effect. Meanwhile rappers like 2Pac are just kind of boring IMO, at least going by their lyrics alone.
dmourati大约 11 年前
I&#x27;m a big fan of the project and the way it is presented. Not sure why Wu-Tang features so prominently but I guess I&#x27;m okay with that. Kool Keith should be broken down further into his constituent parts. I also would have thought the Beastie Boys would have run higher.
评论 #7695648 未加载
评论 #7697188 未加载
andybak大约 11 年前
I would have been rather surprised not to see Aesop Rock fairly high up the list. I was reading the Rap Genius pages for a few of his tracks the other week and the sheer density of wordplay was fairly overwhelming.<p>It is rap for geeks though ;)
danielsf大约 11 年前
author here: hit me up with questions you&#x27;ve got.
评论 #7696090 未加载
评论 #7695719 未加载
评论 #7695340 未加载
评论 #7695680 未加载
评论 #7698967 未加载
评论 #7694957 未加载
Aqueous大约 11 年前
Greatly enjoyed the analysis but while I was reading it I felt a lot like this guy:<p><a href="https://www.youtube.com/watch?v=GKlDBi0cyIA" rel="nofollow">https:&#x2F;&#x2F;www.youtube.com&#x2F;watch?v=GKlDBi0cyIA</a>
NAFV_P大约 11 年前
All the rappers listed seem to be American.<p>Whack this through your Bowers and Wilkins:<p><a href="https://www.youtube.com/watch?v=p_SQEUZomug" rel="nofollow">https:&#x2F;&#x2F;www.youtube.com&#x2F;watch?v=p_SQEUZomug</a>
S_A_P大约 11 年前
I think the only problem I see is that some rap groups are listed as rappers. For instance beastie boys, de la soul and wu tang are listed. So there is some collective vocabulary being compared to single rappers. That said this is cool and pretty telling. From what I could see it is probably loosely couple to the intelligence of the rappers listed. I will echo the sentiments about DMX here. Looks like some shock jock rappers definitely are low on the list (too short).
评论 #7695002 未加载
kenjackson大约 11 年前
This is an interesting analysis.<p>I love the fact that E-40 is about on par with Shakespeare. I&#x27;m sure he would take it as a compliment to be called the modern day Shakespeare.
评论 #7694891 未加载
htk大约 11 年前
&quot;Each word is counted once, so pimps, pimp, pimping, and pimpin are four unique words&quot;<p>So much for the modern Shakespeares on the list.
评论 #7694741 未加载
评论 #7694674 未加载
koala_advert大约 11 年前
I keep getting this error, in Firefox and Chrome:<p>&lt;Error&gt; &lt;Code&gt;AccessDenied&lt;&#x2F;Code&gt; &lt;Message&gt;Access Denied&lt;&#x2F;Message&gt; &lt;RequestId&gt;3CB1F41D7DFDC794&lt;&#x2F;RequestId&gt; &lt;HostId&gt; wHCPzEYPDsmkMJX+YIgjU40YPrGYytHrk5B44dApi7663NkQQI0RKx9A&#x2F;6EX7Iph &lt;&#x2F;HostId&gt; &lt;&#x2F;Error&gt;
评论 #7707803 未加载
评论 #7697174 未加载
dnautics大约 11 年前
How about a 2d visualization with a sliding 10000 word window, with the y axis as unique words out of 10k and the x aaxis time. Are there cultural trends that are time dependent? Did young mc and Del use more words than contemporary artists? Did their trends as artists follow the global trend over time?
selimthegrim大约 11 年前
Maybe this will help me answer that nagging question at the back of my brain: What does DJ Khaled actually _do_?
Grue3大约 11 年前
Would be interesting to see how they compare to rock bands like Titus Andronicus, Fucked Up or Bad Religion.
Totient大约 11 年前
I wonder where things like classic rock &#x2F; broadway musicals &#x2F; opera &#x2F; etc. fits on this spectrum.<p>I really appreciate including Shakespeare and Moby Dick on the spectrum, but I&#x27;d still like some more perspective. For that matter, I wonder how many unique words <i>I</i> use every day.
tokipin大约 11 年前
Just a note, those artists don&#x27;t necessarily use all their vocabulary. Eminem for example clearly holds back on his vocabulary. Rap is as much an art as anything can be so there are all sorts of factors. Be careful what you might want to draw here other than curiousity.
评论 #7695169 未加载
sbierwagen大约 11 年前
Cool to see Canibus so high in the rankings.<p>It&#x27;d also be cool to add the members of AOTP to the analysis.
msutherl大约 11 年前
I would love to see this analysis without filters. Who is <i>the</i> rapper with the largest vocabulary? What does the distribution look like at the top? Surely Antipop Consortium or MF DOOM have larger vocabularies than Aesop for instance.
评论 #7695701 未加载
gfody大约 11 年前
I&#x27;m pretty sure E-40 scored so high because of all the made up words. He&#x27;s highly regarded for being innovative and influential but you know for every piece of slang that stuck there&#x27;s like ten that didn&#x27;t.
ff10大约 11 年前
Really surprised MF Doom is not ranked higher – are his side projects included?
oakaz大约 11 年前
Why Jedi Mind Tricks is not counted? He&#x27;d be the first in this list; <a href="https://www.youtube.com/watch?v=TlZgiK6FiO0" rel="nofollow">https:&#x2F;&#x2F;www.youtube.com&#x2F;watch?v=TlZgiK6FiO0</a>
评论 #7695673 未加载
Mikeb85大约 11 年前
Not particularly surprised at the list. Aesop Rock, the whole Wu-tang Clan, and guys like Nas, Wale, all near the top. DMX and Too Short at the bottom...<p>Definitely comes out in their music...
评论 #7696124 未加载
jarnix大约 11 年前
How many words in &quot;fo shizzle ma nizzle&quot; ? 4 or 0 ?
评论 #7697195 未加载
camus2大约 11 年前
I would love the same chart but sorted by vulgarity.
ignacioelola大约 11 年前
I would love to see the same analysis across different music styles. How compare vocabulary size of Madonna, Bob Dylan and Justin Bieber?
评论 #7698309 未加载
评论 #7697200 未加载
bladecatcher大约 11 年前
I would like to see Dälek included in the study. I&#x27;d be surprised if they didn&#x27;t show up on the far right on the scale.
konceptz大约 11 年前
What I would like to see, is this same comparison done against album sales with the implication of mainstream vs. underground.
评论 #7697202 未加载
b3b0p大约 11 年前
Was it mentioned where the data was sourced from? I&#x27;m not seeing anything and I went back and checked. Did I miss it?
评论 #7696040 未加载
jomtung大约 11 年前
Killah Priest should be grouped with Wu-Tang.
评论 #7697198 未加载
zeppelinnn大约 11 年前
This is awesome. Reminds me of all the data viz they are doing on rapgenius. You forgot Atmosphere though (Slug)
m_mueller大约 11 年前
I&#x27;d be interested in how Nerdcore rappers compare to this, such as MC Frontalot or Professor Elemental.
评论 #7696015 未加载
devindotcom大约 11 年前
Couldn&#x27;t find Aceyalone - I thought he&#x27;d be in the top 10, I guess he wasn&#x27;t included.
评论 #7696764 未加载
评论 #7700202 未加载
thegasman大约 11 年前
No mention of MF Doom? Metalface? Doom? Victor Vaughan? (All the same gentleman from LA)
评论 #7696226 未加载
评论 #7696021 未加载
tps12大约 11 年前
So funny comparing this to the same graph they did for pop lyricists.
snarfy大约 11 年前
I wasn&#x27;t surprised to see Canibus and Outkast up there.
dnlserrano大约 11 年前
Awesome. This guy should definitely work for RapGenius.com.
评论 #7694890 未加载
shaggyfrog大约 11 年前
Incredibly, a list about rapper vocabulary is missing anyone associated with nerdcore.<p>I&#x27;m interested to see where the likes of MC Frontalot, Wordburglar, YTCracker, etc. rank on that scale...
评论 #7694860 未加载
评论 #7695125 未加载
评论 #7694952 未加载
jmt7les大约 11 年前
I&#x27;d love to see Immortal Technique also.
prg318大约 11 年前
Thank you based god!!
1ris大约 11 年前
Shouldn&#x27;t that be adjusted to the size of the text corpus?
评论 #7695004 未加载
moron4hire大约 11 年前
This might be the best-made infographic I&#x27;ve ever seen.
评论 #7694892 未加载
pinkskip大约 11 年前
Woah so awesome!
allan_大约 11 年前
where is KRS-ONE?
评论 #7695523 未加载
benihana大约 11 年前
I&#x27;d really like to see this broken down by established vocabulary and made up vocabulary. I think that would really start to show who were the best lyricists on both ends. Rappers with a lot of made up words might be on the far left, and rappers with a lot of unique words that aren&#x27;t made up would be on the far right. Both sides of the scale would show rapping talent on different dimensions. Influential rappers like E-40 who add new words to the vocabulary, and wordy rappers like Aes on the right who use a really dense and descriptive vocabulary.
评论 #7695014 未加载
评论 #7697152 未加载
评论 #7696322 未加载
评论 #7695024 未加载
skylan_q大约 11 年前
Kool Keith should be exempt from this list. He&#x27;s not from any of the 4 regions listed, but from Jupiter.
评论 #7694909 未加载
评论 #7695349 未加载
评论 #7696619 未加载
评论 #7694704 未加载
评论 #7694876 未加载
评论 #7696900 未加载
thrownaway2424大约 11 年前
Gotta wonder about the garbage-in factor of Rap Genius. From one randomly selected Aesop Rock cut:<p>&quot;Please I want to donate my brain to the monstrous Panasonic profit&quot;<p>I guess it could be. I always heard it as &quot;monstrous Panasonic prophet.&quot; It would be in keeping with the previous lyric &quot;Television, all hail grand pixelated god of fantasy.&quot;
评论 #7695550 未加载
评论 #7695338 未加载
评论 #7695906 未加载
评论 #7697159 未加载
sarreph大约 11 年前
We might all be self-confessed <i>hackers</i>, but we&#x27;ll never explicitly confess our adoration for the gloriousness of the genre that is <i>gangster rap</i>.
评论 #7695040 未加载
评论 #7694993 未加载
simonster大约 11 年前
The estimate of vocabulary size here is based on the number of unique words used. This seems like it is strongly biased: if two artists have the same size vocabulary, but one has released more albums and thus used more words, that artist will probably have used more unique words. To underscore this point, the number of unique words used by Aesop Rock is half of the estimated vocabulary size of the average college student, although to be fair that estimate is the number of words that an individual can recognize, not the number of words they use. (Edit: the bias is somewhat mitigated by the fact that the same number of words is used to estimate the vocabulary for each artist, but the bias is not dependent on sample size alone but also upon the size of the artist&#x27;s underlying vocabulary; see my comments below.)<p>The underlying problem is one of estimating the cardinality of a multinomial distribution given a fixed number of samples. In isolation this problem is ill-posed, since it is always possible that there is a word in a given lyricist&#x27;s vocabulary that he uses with very low frequency and that is unlikely to appear in any sample, but with appropriate prior information it may be possible to obtain an accurate estimate.<p>This is not my field, but a brief Google Scholar search shows that there are several papers on estimating vocabulary size, or equivalently, estimating the number of species based on sampling. There is a somewhat dated review (<a href="http://cvcl.mit.edu/SUNSeminar/BungeFitzpatrick_1993.pdf" rel="nofollow">http:&#x2F;&#x2F;cvcl.mit.edu&#x2F;SUNSeminar&#x2F;BungeFitzpatrick_1993.pdf</a>) that details some methods of estimation (in this case, I believe we are in the domain of &quot;infinite population, multinomial sample&quot; with unequal class sizes). The paper notes that there is no unbiased estimator available without assumptions on the distribution of word use frequencies, but some of the proposed estimators may be more accurate than the naive estimate used here.
评论 #7695288 未加载
评论 #7695279 未加载
评论 #7695339 未加载
评论 #7695301 未加载
评论 #7695292 未加载