Back when I lead Network Engineering at Square, we had a global network of production dataceters and offices. Nothing of the scale of Cloudflare, but we still have many hundreds of devices. They had all been built manually with copy and pasted configs. The trouble with this was it made changes very risky and terrifying because there could be subtle inconsistencies between sites, or huge differences. So it was always very challenging to reason about the impact of a given change.<p>Thanks to a ton of grit by the team, and the insistence of one engineer in particular we built a config management system and started tracking the total percent of our global network config that was managed by our config management system.<p>That metric was regularly presented at the VP level to hold us accountable to getting the percentage to 100.<p>It was months and months of boring work to remove inconsistencies and templatize configs. But in the end, I believe it resulted in a much more reliable and ultimately safer network to operate. I'm also happy that my management chain saw the value in this work.<p>I'm quite proud of the work the team did.<p>Some side benefits were that once we started going through audits like SOC2, we had a really good story to tell about how we reviewed and pushed changes to production.
I really like way Avaya and Cisco DNA do automation. It’s all centrally managed and it uses VXLAN so no vlans have to be present. Just say you want vlan 20 here and here and it takes care of the rest