Locking down this network of services is a massive security improvement and they've used some very neat ways of achieving it. Overall, I really appreciate them writing this up.<p>However, 1500 services? That really feels like they're separating things at too granular a level. Does every one of those things really need to sit behind a network call? Couldn't some of that re-use be via code libraries? I wonder what the service to developer ratio is?
Great post. I really appreciate engineering blogs written in this storytime format. I don't have time to dive into the implementation of Calico or <insert one of the 1,261 kubernetes projects here><i>, but I learn a lot from reading the process a team goes through in figuring out and iterating on a solution.<p></i> <a href="https://landscape.cncf.io/" rel="nofollow">https://landscape.cncf.io/</a>
Another potential solution is to use a constraint solver like MSFT Z3, or if you want a nicer syntax and more flexibility, Prolog.<p>E.g. <a href="https://medium.com/@ahelwer/checking-firewall-equivalence-with-z3-c2efe5051c8f" rel="nofollow">https://medium.com/@ahelwer/checking-firewall-equivalence-wi...</a><p>This is much more scalable in the long run.
If the authors are reading this I was wondering two things:<p>1. Why was static analysis of the code chosen over observing the system during runtime and integration testing?<p>2. What was the reason rhe CNI layer was chosen for the implementation of this over the service mesh layer?<p>Something that really interests me about bazel/buck/pants/please is it automates #1 entirely with dep queries.
Applying network filtering, while being a nice extra layer, it should not be the only layer. Services should need authorization like if it was an open api.
> attempt to find code that looked like it was making a request to another service.<p>> We generally fixed those cases by adding a special comment in the code that told rpcmap about the link<p>Why not enforce all endpoints/urls be defined in a config file and sidestep this? - scanning code for URLs/constructed URL is overkill and brittle.
Strikes me that some services ideally need to expose multiple interfaces, and that isolation should be on a per-service-interface basis.<p>E.g. the monitoring service should only be able to access the metrics part of each service.
Nice write-up! Thats the beauty of scale, explain a part in detail, then go with the 30,000 foot view.<p>IMM, the security orchestration may actually become the "app" as speeds continue to increase, compute costs go even lower and losses incurred from compromised data/networks increase.<p>A true zero trust platform that keeps all of the doors closed or "instances/vm" offline until (the milliseconds) they're needed is the security symphony we might see on the horizon.<p>Data silos and walled gardens may never go out of style, they'll just take on new acronyms.
Impressive achievement. It still sounds like callee's have more knowledge of callers than is justified. Is it a security property or a component functionality property? How do those interact?<p>A centralised graph representation of the security/functionality properties would be a better way to represent this information, so it can catch adding interfaces which should be forbidden. Also able to be configuration managed as sets of microservices.<p>If you have a connectivity graph it would be good to do taint analysis to see how far bad information can propagate.
Curious if you looked at using oAuth with client credentials grant for each service?<p>Also didn't see any mention of prior art like <a href="https://cloud.google.com/beyondcorp/" rel="nofollow">https://cloud.google.com/beyondcorp/</a>.<p>Thanks for the great writeup!
Nice work. If you define your policies based on a tagging taxonomy you could centrally manage these inbound/outbound service relationships. Every new instance or container would assume same network policies based on tag.
> This would read all the Go code in our platform, and attempt to find code that looked like it was making a request to another service.<p>Is there a link about how much Go does Monzo they use?