<i>Way back in 2004, I ran a little experiment with Google -- over a period of a week, I searched for an entire dictionary of ~110k individual English words and recorded how many hits Google returned for each.</i><p>Of course, a word can appear on a page multiple times. That's why, I think, folks used to ignore the stopwords. They introduced noise when trying to access the content words. Now, with span constraints, you can incorporate them into the analysis. So "a matrix" and "the matrix" returns very different results, even without quotes.