Used TLA+ for security/availability modeling at a previous venture. Didn't find it added more value than other formal methods with less scope and ambition when it came to implementation, but I had very limited resources and we were greenfield so the experience probably wasn't generally applicable.<p>IMHO (from an implementation perspective): (1) If state is a pain to model, this may be a symptom you aren't placing interfaces in the right places (ie. scope is too broad). (2) If your system behavior departs from known models (ie. expected behavior under rigorously analyzed and easy to model consensus algorithms) then this may be a symptom of larger issues such as unrecognized state or edge cases, which are better tackled first. (3) A practical alternative to the theoretical model of 'safe' machines within the post would be an implementation model with a functional approach ... ie. if in doubt, reset to known state. (4) To model "bad" actors practically, you have to define them. To do so within meaningfully any complex (black box) multi-agent (networked) system, you need a good model of known "good" behavior for any given node. Building this is IMHO more useful than pie-in-sky theoretical modeling, because it allows the efficient (~fully automatic) deployment of practical countermeasures (anomaly detection on resource utilization, communication, etc.). To do this for near-arbitrary systems requires a tight CI/CD pipeline with above-average full stack security insight, for-purpose infrastructure and management level commitment to the additional resources it will cost. <i>Only</i> when all that is done, model for concurrently compromised nodes at this level, because you've surpassed more immediate/pragmatic requirements.