It's good to see discussions of static analysis, but I often feel that these blog posts do a disservice to the techniques. The post leads by mentioning applications like bugfinding and security vuln detection but the examples here are barely above local syntactic checks. This is the common scenario in the majority of blog posts I see about static analysis, probably because it is just much easier to put together a quick write up on AST-linting. Heck, this article has a diagram that directly states that an AST is the input to a static analysis module, but that is true only for some kinds of things!<p>AST level analysis is certainly useful. Everybody should be using some sort of style checker. But AST pattern matching is a <i>completely</i> different technique from the stuff used to do bugfinding that I worry that these blog posts will give the wrong impression about what static analysis can do and what it can't do.<p>I'd love to see blog posts about interprocedural pointer analysis, for example.
The kinds of analyses mentioned here are typically grouped under "linting"–more advanced static analysis tools will typically do things like dataflow analysis.
Going to drop a toplevel comment and say while this is interesting (sincerely!) if people are interested in deeper tools/techniques the book Practical Binary Analysis is excellent, it ends in taint checking, symbolic excution techniques and uses Pin. <a href="https://practicalbinaryanalysis.com/" rel="nofollow">https://practicalbinaryanalysis.com/</a><p>Also worth checking out is BAP, the Binary Analysis Platform, which is the successor project to Bit Blaze, and is one of the most fascinating binary analysis frameworks out there for my money. It was the only one of the darpa CGC entries that ran on real binaries, not the much less complicated ones developed specifically for the challenge.<p><a href="https://github.com/BinaryAnalysisPlatform/bap" rel="nofollow">https://github.com/BinaryAnalysisPlatform/bap</a>
Slightly tangential to what the article is about, but at least in the C/C++ world, the most important change to make static analysis popular for "the rest of us" was probably Xcode's decision to integrate clang analyzer right into the Xcode UI under a menu item (Xcode doesn't do many things right, but this is definitely one of the very good features).<p>This way, analyzing the code is a simple "button press" and works out of the box on every Xcode project.<p>Soon after, Microsoft followed suit in Visual Studio (even though in my experience, the MS analyzer doesn't catch quite as many things as the clang analyzer).<p>Before that, static analyzers were those no doubt useful but obscure "magic tools" which were very hard to integrate into an existing build process.<p>Even the most useful tool will be ignored when it is hard to use.
Thanks for this article, dolftax! I followed all the examples on my machine with no problem, and I learned some new stuff.<p>I have a question: how difficult is it to implement the ast? It seems like that the bulk of the work for this static code analysis.
For "Detecting unused imports", why not record the line numbers on the first pass as well? Then we don't need to traverse the tree again