Really interesting, but not as representative as it should be. It's not clear why some have larger vocabulary than others. It could be using words like "zeitgeist" (in case of Aesop Rock) or some clever wordplay (I don't know much about hip-hop, so I can't find example for some artist from the list right off the bat, but I remember Marilyn Manson using word "gloominati" for instance) or pretty meaningless made up words like "schizzle" (in case of Snoop Dogg) or usual derivatives like "fuckedy fuck". Moreover, in many transcripts for hip-hop people write down words as they are pronounced, which can be pretty much distorted for some artists (which of course ideally shouldn't count as a "new word", but that's complicated, yeah).<p>While Aeson Rock and DMX are clearly extreme and not surprising at all, it's not that clear for some guys in the middle.<p>So, first off, for every data project sources should be provided, or at least more specific definition, how text was processed, tokenized, analyzed. Second, several more "data slices" should be provided, for instance <i>100 most used words which are unique for that artist compared to other artist in the list</i>.