I find the premise that different groups should expect the same percentage of interventions highly suspect. Imagine we have a program that distributes seeing eye dogs. This toolkit would discover that sighted persons have a 0% chance of getting a dog, while blind persons have a 50% chance. Oh the injustice!
FairML <a href="https://github.com/adebayoj/fairml" rel="nofollow">https://github.com/adebayoj/fairml</a> and algofairness <a href="https://github.com/algofairness/BlackBoxAuditing" rel="nofollow">https://github.com/algofairness/BlackBoxAuditing</a> are some similar, earlier projects in the same space.
If the goal is equity of outcome above-all-else, ignoring for any differences derived from the data, then why are we bothered investing so much time, money and effort into this area?