Skip to main content

A paper a day keeps the doctor away: Gorilla: A Fast, Scalable, In-Memory Time Series Database

Operating large scale Internet services today is a challenge, and making sure that the services run well with minimal customer disruptions is doubly so. The reason is that both require good visibility into how the individual service components are performing, which necessitates gathering and analyzing a lot of measurements about the performance.   The measurements vary from metrics annotated with labels or dimensions that can be used to filter and group the results at query time, to exception stacks, log lines, and trace events. Collecting and analyzing such a large amount of metrics is the realm of time series databases, and the paper: "Gorilla, a fast, scalable, in-memory time series database" presents such a system which is in use at Facebook to handle monitoring and alerting their vast infrastructure.

In the paper the authors start by articulating the design principles for Gorilla: they wanted a system that is always available for writes; they tolerated the loss of a small amount of data; and they wanted a geo-replicated system that is partition tolerant to network cuts, software updates, and configuration changes.

From their prior experience with monitoring systems at Facebook, the authors made a couple of insightful observations. First, they noticed that users of monitoring systems are mostly interested in recent data points rather than historical data, with almost 85% of the diagnostics queries being for data collected in the last 26 hours. Second they observed that users are interested in aggregate data rather than individual data points for diagnostics scenarios. Third the authors noticed a big disparity between write rates that exceeded millions per second, and read rates that are orders of magnitude lower, and mostly from dashboards and alerting systems.

These observations steered the authors toward an in-memory system that ingests and computes over the last 26 hours of data,  backed by an HBase solution for the historical data. At Facebook's scale this presented a challenge, and the authors shared some of the perf requirements in the paper:

  • 700 million data points (time stamp and value) added per minute
  • Data stored for 26 hours
  • More than 40,000 queries per second at peak
  • Reads succeed in under one millisecond.
  • Support time series with 15 second granularity (4 points per minute per time series)
  • Two in-memory, not co-located replicas (for disaster recovery capacity)
  • Always serve reads even when a single server crashes
  • Ability to quickly scan over all in memory data
  • Support at least 2x growth per year.


The authors came up with brilliant compression schemes that allowed them to compress the time series data to roughly 1.37 bytes per point, or a 12x reduction in size, which allowed them to store 26 hours' worth of data in 20 hosts with 1.3TB of memory.

Each time series consists of monitoring data, modeled as a tuple: a string that serves as a key to the time series, a 64 bit timestamp, and a double precision float value. The time series are sharded by key, and each key goes to a Gorilla host. Every write goes to two different geo locations, and on host failure, all reads failover to the different region.

The authors compress the time series data using delta encoding using two ways. First is compressing the timestamp for every monitoring data tuple. The series starts with an uncompressed 64 bit timestamp. Further timestamps are compressed using delta-of-deltas. For example, if the deltas are 60, 60, 50, and 61 the delta of deltas are  0, -1, and 2. The delta of deltas are further encoded using variable length encoding depending on the range the delta falls into. For -63 to 64 the prefix is 10 followed by 7bits, for -255 to 256 the prefix is 110 followed by 9bits, and for -2047 to 2048 the prefix is 1110 followed by 12bits, otherwise the prefix is 1111 followed by 32 bits.

For monitored values, the floating points are compressed by xor'ing with the prior value, and encoding the resulting integer with a header '11' in two bits, number of leading zeros, how many meaningful bits exist, and the value of the meaningful bits. Again, the first monitored value is stored without compression. The subsequent values are XOR'ed with the previous values. If the result is 0, a single 0 bit is stored. Otherwise the authors calculate the number of leading and trailing zeros, and store a prefix 1 followed by:
0 if the block of meaningful bits falls within the block of previous meaningful bits, and store the meaningful block
Otherwise store 1, the length of leading 0s, the length of meaningful bits in the XORed value, and the meaningful bits

For Facebook's monitored data, the authors found that roughly 51% of the values are encoded with 1 bit,  about 30% compressed with '10', and 19% compressed with '11'.

The authors use in-memory data structures to store time series data with a lookup map from time series name to the data for quick access. When time series are deleted, they are tomb stoned.

The in-memory data structures are backed by disk, and the data is stored in GlusterFS with 3x replication. Each host owns multiple shards of data stored in a single directory per shard. The directory contains key lists, append only logs, complete block files, and checkpoints. The shard manager uses Paxos to assign shards to nodes.

Data is streamed between Gorilla instances without guaranteeing consistency between the geo locations. If one geo fails,  the query traffic is sent to the other geo center, and is only restored after the region has been healthy for 26 hours.

The authors then describe the tools that use Gorilla for monitoring and alerting. There is a low latency query processing tool, a correlation engine that compares the time series to millions of others to figure out what happened around the time a service broke, a charting service to visually find where the outliers are, and an aggregation service.

The authors end with their experiences operating Gorilla over the prior 6 months including fault tolerance, and help with diagnostic scenarios, and close with the lessons they learned, namely:
  • Prioritize recent data over historical data
  • Read latency matters
  • High availability trumps resource efficiency

Comments

Popular posts from this blog

Why good customer service matters?

I am not an Apple fan, but I do like their computers, and recommend them to colleagues and friends for a variety of reasons. They are well designed, and in addition to an excellent user interface, they run a flavor of Unix--which makes the life of computer programmers a lot easier. But most importantly, Apple's customer support is impeccable, that despite all the hardware issues I experienced in the past, I still recommend Apple computers. Let me explain why. A year and a half ago, I bought a Mac Book Pro for work. At the time it was the first generation unibody laptop, that had an i7 processor, lots of memory, and lots of disk space. Alas, like first generation models everywhere, it also had a lot of hardware problems. The most annoying of which was the screen randomly turning dark, with the hard drive spinning out of control. The only way to get out of this state was by forcing a reboot by holding down the power button, and losing everything I have been working on. At first

Kindle Paperwhite

I have always been allergic to buying specialized electronic devices that do only one thing, such as the Kindle, the iPod, and fitness trackers. Why buy these when technology evolves so fast that a multi-purpose device such as the phone or a smart watch can eventually do the same thing, but with the convenience of updates that fix bugs and add functionality? So, I was shocked when this weekend I made an impulse buy and got the newest Kindle Paperwhite—a special purpose device for reading eBooks. I was walking past the Amazon store in the mall and saw that the newest Kindle Paperwhites were marked down by $40 for the holidays. The device looked good in the display, so I went in to look at it closely. The Paperwhite is small and light, with a 6” screen that is backlit and waterproof.   The text was crisp and readable, and in the ambient light, it felt like I am reading a printed book. I was sold and bought it on the spot. At home I have struggled to put it down. The books

New ASUS RT-AX88U router

  I have been using Asus routers for many years, and have been pretty happy with them. The web interface is superb, and the firmware upgrades are timely and easy to apply, and over the last couple of years have introduced newer features that kept my old router relevant and functional.   After many years of service, my older router finally gave way, and started dropping Wifi connections randomly, especially when under heavy load. The connection drop happens whenever the kids have a Zoom meeting, or my wife and I are on work calls. Turning the laptop/iPad Wifi off and on again did not help, and we usually had to reboot the router to be able to connect again. Out of curiosity I looked at the CPU/memory stats of the router under heavy load, and could not see any issues. Even when all of us were in video calls, the CPU/memory did not rise about 50%. I could not see anything abnormal in the logs either. Online I saw that a lot of people had similar problems after upgrading to the latest rout