Nice writeup, but it leaves me curious about the root cause:<p>For some reason, our switches were unable to learn a significant percentage of our MAC addresses and this aggregate traffic was enough to saturate all of the links between the access and aggregation switches, causing the poor performance we saw throughout the day.<p>Did you work with your vendor to understand what caused the above problem? Was it a lack of number of entries in the MAC table?<p>This problem aside, I am wondering why you still run layer 2 network in a tree-like configuration. These are known not to scale well, beyond a small LAN. An appropriate layer 3 network (with multipath routing) would ensure there is no such flooding, and ensure you use all the precious capacity in your switches!