Building a Network Health Monitoring System: Optics Monitoring Use-Case
Jeremy Schulman, Major League Baseball
As a member of the networking team I need to monitor network health metrics in such a way that allow me to take action. As a use-case consider interface transceiver Digit Optic Measurements (DOM) that include receive power, transmit power, and temperature. It is not enough to collect and report these optic values. We need to know if the metric is within normal operating thresholds, within a warning threshold, or within an alarming threshold so that we can take action on those interfaces that are an issue. We also want to associate context data to these metrics, for example: the device hostname, the device role, the interface name, and the interface description. We use this context data to help orient ourselves to the severity and impact of any anomalies. For example an alarming threshold on a core WAN interface may be higher priority than an alarming threshold on another type of interface.
This presentation will showcase an optics monitoring solution supporting multiple vendors and network operating systems, built using Grafana, InfluxDB and other open-source tools. The presentation will discuss the motivation for needing health monitoring systems that allow for the user-defined derived metrics to create actionable status. The audience will learn what it takes to build these types of solutions, the solution architecture, deployment topics, and scalability topics.