Cloudprober, which we open sourced last year, answers questions like these and more. It’s black-box monitoring software that "probes" your systems and services and generates metrics based on probe results. This kind of monitoring strategy doesn’t make assumptions about how your service is implemented and it works at the same layer as your service’s users. You can make changes to your service’s implementation with peace of mind, knowing you’ll notice if a change prevents users from accessing the service.
A probe can be anything: a ping, an HTTP request, or even a custom program that mimics how your services are consumed (for example, creating and accessing a blog post). Cloudprober builds and exports standard metrics, and provides a way to easily integrate them with your existing monitoring stack, such as Prometheus-Grafana, Stackdriver and soon InfluxDB. Cloudprober is written in Go and works on all major platforms: Linux, Mac OS, and Windows. It's released as a static binary as well as a Docker image.
Here’s an example probe config that runs an HTTP probe against your forwarding rules and exports data to Stackdriver and Prometheus:
probe { name: "internal-web" type: HTTP # Probe all forwarding rules that contain web-fr in their name. targets { gce_targets { forwarding_rules {} } regex: "web-fr-.*" } interval_msec: 5000 timeout_msec: 1000 http_probe { port: 8080 } } // Export data to stackdriver surfacer { type: STACKDRIVER } // Prometheus exporter surfacer { type: PROMETHEUS }
The probe config is run like this from the command-line:
./cloudprober --config_file $HOME/cloudprober/cloudprober.cfg
This example probe config highlights two major features of Cloudprober: automatic, continuous discovery of cloud targets, and data export over multiple channels (Stackdriver and Prometheus in this case). Cloud deployments are dynamic and are often changing constantly. Cloudprober's dynamic target discovery feature ensures you have one less thing to worry about when doing minor infrastructure changes. Data export in various formats helps it integrate well with your existing monitoring setup.
Other features include:
- Go text templates based configuration which adds programming capability to configs, such as "for" loops and conditionals
- Fast and efficient implementation of core probe types
- Custom probes through the "external" probe type
- The ability to read config through metadata
- And cloud (Stackdriver) logging
Cloudprober was built by the Cloud Networking Site Reliability Engineering (SRE) team at Google to monitor network availability and associated features. Today, it's used by several other Google Cloud SRE teams as well.
We’re excited to share Cloudprober with the wider devops community! You can find more examples in the GitHub repository and more information on the project website.
By Manu Garg, Cloud Networking Team