Obviously this is just for fun, but I am a bit disturbed about similar projects that try to gain some meaningful insights out of vast public data sets without the slightest attention paid to the quality of the data used. It doesn’t matter how much processing you do or how clever your algorithms are if the underlaying data is inaccurate, out of date, inconsistent, non-normalized, incomplete, etc. No dataset is perfect, but at least take some time to address it.