<p><pre><code> I've poured over ((ok, grepped) ~500GB of Chroincling America data to find lines that meet my low standard for nonsene, basically ones that match egrep "[^a-zA-Z0-9 ]{3,}"
</code></pre>
I'm super curious to know fast this was. grep is generally very fast and this should be doable on a normal computer, though it might take a little while