This is really cool. I've spent some time thinking about a similar idea in the past [0].<p>My idea was to parse the CVE database for bugs in open source code, then identify the patches used to fix the bugs. From the patch data, you can get an efficient diff of what the "vulnerable" code looks like and what the "fix" for it looks like. You can then convert the code to abstract syntax tree or feed it to a static analysis engine to use as "signals" in training a machine learning algorithm. Then you can apply the machine learning algorithm to open source databases and identify possibly vulnerable code paths.<p>Looks like this paper had success doing something similar. Awesome!<p>[0] <a href="https://news.ycombinator.com/item?id=11573547" rel="nofollow">https://news.ycombinator.com/item?id=11573547</a>