Cool use of SR-IOV, I like it.
We've done a few (academic) experiments with SR-IOV for flow bifurcation and we've wondered why no one seems to use it like this. The performance was quite good: neglible performance difference between PF and a single VF and only 5-10% when running multiple >= 8 VFs (probably cache contention somewhere in our specific setup).<p>You seem to be running this on X540 NICs, aren't you running into limitations for the VFs. Mostly the number of queues which I believe is limited to 2 per VF in the ixgbe family.
I wonder whether the AF_XDP DPDK driver could be used instead if SR-IOV isn't available or feasible for some reason.<p>A more detailed look at performance would have been cool. I might try it myself if I find some time (or a student) :)