Makes me wonder how many other lurking type confusion bugs there are. KASAN only found this one because it happened to produce an out-of-bounds access (I guess pci_dev is larger than whatever struct the pointer really pointed to); it can’t detect the type confusion itself.
I identified with the author’s reluctance to launch a debugger versus rebuilding with a quick n dirty print statement. But (as the author also seems to accept) this is short term gain for long term pain. The extra time to launch a debugger and mentally reset is easily exceeded by the time that would be spent not doing this and endlessly hacking away adding in more logging.
Shouldn't bus_register_notifier() take a notifier_fn_t (and int priority) directly and create its own struct notifier_block? Why does it expose its linked-list node through its interface?<p>I guess maybe that would make un-registration less ergonomic (assuming you have to handle duplicate entries in the list). But the peril of the bug described here seems worse.
I think this is a great example where modern languages are better.<p>- It shows that declaring statics is unsafe.<p>- It shows that taking a reference to a static is inherently unsafe.<p>- It shows that it actually helps if we distinguish passing something as mutable or non-mutable, and keep said mutability exclusive.<p>I find this code VERY opaque to read in terms of what it is that we are actually passing.<p>I'm also confused why the creation of the structs wasn't and still isn't wrapped in an `ifdef`. Now if I only have a VIO device there is still a lingering `fail_iommu_pci_bus_notifier` in the code at top level. I'm not familiar enough with how this is compiled, but it looks like it's part of the public interface, and as such would not be able to be compiled away?
Isn't the problem that both CONFIG_IBMVIO and CONFIG_PCI are defined?<p>I had to see what IBMVIO was, it looks like the IBM Virtual I/O Server. Although you actually fixed the bug, the other bug might be that IBMVIO is defined when it's "The VIOS is part of the PowerVM® Editions hardware feature"<p>Were you on an IBM power box?<p><a href="https://www.ibm.com/docs/en/power8?topic=server-virtual-io-overview" rel="nofollow">https://www.ibm.com/docs/en/power8?topic=server-virtual-io-o...</a><p>I'm just curious, because the IFDEFs assume one-or-the-other as written. Is having both busses a valid configuration?
What the author doesn't explain, but I find myself wondering, is how did this go unnoticed for so long? What conditions expose it? Obviously it's easy enough to trigger in QEMU, but is real hardware in that condition so rare?<p>I feel like I missed a bit of context for what launched this whole adventure. The adventure itself was interesting, though!