Fault Localization in Large-Scale Network Policy Deployment
The recent advances in network management automation and Software-Defined Networking (SDN) are easing network policy management tasks. At the same time, these new technologies create a new mode of failure in the management cycle itself. Network policies are presented in an abstract model at a centralized controller and deployed as low-level rules across network devices. Thus, any software and hardware element in that cycle can be a potential cause of underlying network problems. In this paper, we present and solve a network policy fault localization problem that arises in operating policy management frameworks for a production network. We formulate our problem via risk modeling and propose a greedy algorithm that quickly localizes faulty policy objects in the network policy. We then design and develop SCOUT---a fully-automated system that produces faulty policy objects and further pinpoints physical-level failures which made the objects faulty. Evaluation results using a real testbed and extensive simulations demonstrate that SCOUT detects faulty objects with small false positives and false negatives.
READ FULL TEXT