I recently made a quite elaborate system for systematically finding suspected fraudulent loans in a giant 8.4gb CSV dump of PPP loan data using lots of interesting Python data science techniques. The entire thing is open-source, and you can easily replicate the findings, which are depressing.<p>If you just want to see the complete final outputs of the analysis that looks at the most suspicious looking loans (after scoring them using a powerful model that looks at many indicators of fraud), you can see them here:<p><a href="https://raw.githubusercontent.com/Dicklesworthstone/ppp_loan_fraud_analysis/refs/heads/main/final_output_of_analysis_step_in_ppp_loan_fraud_analysis.txt" rel="nofollow">https://raw.githubusercontent.com/Dicklesworthstone/ppp_loan...</a><p>I did all of this work in the last couple days, mostly using Grok3, which was a really great way to get familiar with this new and very powerful model. I was impressed with how well it worked, both in terms of helping to come up with ideas for the system and also implementing it.<p>I also wrote a blog post about it with more details (although the readme file in the repo is probably more informative, if technical):<p><a href="https://fixmydocuments.com/blog/02_ppp_loan_fraud_analysis" rel="nofollow">https://fixmydocuments.com/blog/02_ppp_loan_fraud_analysis</a>