opensource.google.com

Menu

Cloudprober: open source black-box monitoring software

Friday, March 23, 2018

Ever wonder if users can actually access your microservices? Observe timeouts in your applications, and not sure if it's the network or if your servers are too busy? Curious about the 99%-ile network latency between your on-premise data center and services running in the cloud?

Cloudprober, which we open sourced last year, answers questions like these and more. It’s black-box monitoring software that "probes" your systems and services and generates metrics based on probe results. This kind of monitoring strategy doesn’t make assumptions about how your service is implemented and it works at the same layer as your service’s users. You can make changes to your service’s implementation with peace of mind, knowing you’ll notice if a change prevents users from accessing the service.

A probe can be anything: a ping, an HTTP request, or even a custom program that mimics how your services are consumed (for example, creating and accessing a blog post). Cloudprober builds and exports standard metrics, and provides a way to easily integrate them with your existing monitoring stack, such as Prometheus-Grafana, Stackdriver and soon InfluxDB. Cloudprober is written in Go and works on all major platforms: Linux, Mac OS, and Windows. It's released as a static binary as well as a Docker image.

Here’s an example probe config that runs an HTTP probe against your forwarding rules and exports data to Stackdriver and Prometheus:
probe {
  name: "internal-web"
  type: HTTP
  # Probe all forwarding rules that contain web-fr in their name.
  targets {
    gce_targets {
      forwarding_rules {}
    }
    regex: "web-fr-.*"
  }
  interval_msec: 5000
  timeout_msec: 1000
  http_probe {
    port: 8080
  }
}

// Export data to stackdriver
surfacer {
  type: STACKDRIVER
}

// Prometheus exporter
surfacer {
  type: PROMETHEUS
}

The probe config is run like this from the command-line:
./cloudprober --config_file $HOME/cloudprober/cloudprober.cfg

This example probe config highlights two major features of Cloudprober: automatic, continuous discovery of cloud targets, and data export over multiple channels (Stackdriver and Prometheus in this case). Cloud deployments are dynamic and are often changing constantly. Cloudprober's dynamic target discovery feature ensures you have one less thing to worry about when doing minor infrastructure changes. Data export in various formats helps it integrate well with your existing monitoring setup.

Other features include:
  • Go text templates based configuration which adds programming capability to configs, such as "for" loops and conditionals
  • Fast and efficient implementation of core probe types
  • Custom probes through the "external" probe type
  • The ability to read config through metadata
  • And cloud (Stackdriver) logging
Though most of the cloud support is specific to Google Cloud Platform (GCP), it’s easy to add support for other providers. Cloudprober has an extensible architecture so you can add new types of targets, probes and monitoring backends.

Cloudprober was built by the Cloud Networking Site Reliability Engineering (SRE) team at Google to monitor network availability and associated features. Today, it's used by several other Google Cloud SRE teams as well.

We’re excited to share Cloudprober with the wider devops community! You can find more examples in the GitHub repository and more information on the project website.

By Manu Garg, Cloud Networking Team
.