Get ready to Rocksteady

Thursday, September 23, 2010

Rocksteady is an effort to use Esper Complex Event Processing (CEP) to analyze user defined metrics. You can use it to parse your data and turn it into events that Esper CEP can query so that you can respond to events in real time.

Too often, metrics and graphs are only useful as an aid in analyzing what happened after things have gone wrong. Staring at a dozen graphs on a TV wall isn't monitoring, it's a waste of time. The goal of Rocksteady is to determine the root cause of breakage based on metrics in real time. Metric analysis is only part of the whole picture though, as we also present solutions including metric convention, metric sending, load balancing, and graphing.

Rocksteady can be used in a number of different environments, but here on the AdMob operations team, we use it to determine the cause of events such as latency. We monitor requests per second (rps) and a slew of other metrics such as CPU and network traffic, then put them together in a prediction algorithm such as Holt Winters to predict a confidence band for the next arriving value. We then record an event whenever metrics are outside the band more than a certain number of times in a row. This is what we call auto threshold establishment. Now, if we have a SLA we really care about, such as response time, we can set a hard threshold, say 250ms. When response time slows beyond 250ms, Rocksteady tells us whether rps, CPU or network crossed their respective thresholds. Now instead of just knowing there is a latency problem, we can also quickly pinpoint the potential cause.

Rocksteady was briefly mentioned in Ignite talks at the 2010 Velocity Conference and Devops Day and now it’s finally ready for open source. Let us know if you have any questions, and enjoy!

By Mark Lin, Operations Engineering Team