Most of the evil in this world is done by people with good intentions.T.S. Eliot
I decided to start a series where I talk about how good design choices can result to bad experiences. I do my best to give an objective look into various architectural decisions and engineering problems, and how to avoid some of the most common pitfalls.
At work, we heavily rely on Kong for our API Gateway. This piece of infrastructure handles all of the ingress and egress for our primary production Kubernetes cluster. It serves multiple functions, and handles about 6,000 to 10,000 requests per second. We use Cassandra for the data storage, as it is the recommended data store for Kong at high volume. And in 2 years, our Cassandra cluster has grown significantly: from a 6 node, single data center cluster, to 2 data centers with 24 nodes each.
At the time of writing this, we are using Kong v2.0.1. Based on the changelog, there have been enhancements, so your experience may vary.
We encountered an issue where one of our services, when it hits 150 requests per second, would throw Kong into an unstable state. This particular service uses OAuth2 tokens, but the tokens are not generated by Kong. It is generated by another system, and saved by our service to Kong via the official OAuth 2.0 plugin’s migrate token API. Succeeding requests use the token to access other services protected by Kong.
Saving a token has an average latency of 5 seconds. For a regular user, this is imperceptible (since the process is asynchronous), but on days with heavy traffic, this 5 second latency can extend to as much as 30 seconds, and can already hamper the experience of our users (especially the savvy ones).
Once we hit the 150 requests per second mark on the said service, we started seeing a progression of problems:
- Our service endpoints that pass thru Kong starts throwing 404’s.
- We would see an increase in invalid JSON responses due to a null value.
- Finally, we would see timeouts coming from the
/oauth2_tokensendpoint. In the Kong logs, we would see that connections to our Cassandra nodes are timing out.
- Multiple service outage (ouch)
And on bad days, it would overload some of our Cassandra nodes, which will trigger a restart of the node. At this point, we had no choice but to issue a rolling restart on all Kong nodes.
Other symptoms include increasing CPU and memory utilization of Kong nodes over time, to the point where we threw 16 cores per node just so we won’t max out on CPU utilization. Our Cassandra data centers were running on 8 core machines with 64GB of RAM, and uses NVME’s for maximum performance. However, the CPU utilization of all nodes were at 90-95%, with a typical 15 minute load average of 40 per node.
Despite the capacity upgrades on both Kong and Cassandra, we were seeing very little to no incremental improvement. At some point, it got worse. Due to the constant failures of our Cassandra nodes, we tried to improve overall reliability by increasing the replication factor from 3 to 7, and increased the total number of nodes to 40. With a consistency level of
QUORUM, we should survive 3 node failures. Instead of getting better results, we still encountered issues at the same RPS.
Something was fundamentally wrong. We needed to go deeper. And so we cloned the source code of Kong to see what was going on.
And we found our culprit…
As it turns out, Kong uses a data access layer that maps queries to either Cassandra or PostgreSQL in a one-to-one fashion. While the intention is good, this became its fundamental flaw. A Good Design Gone Bad.
To understand why, we need to look deeper.
The very first public version of Kong was built with Cassandra as its data store. Support for PostgreSQL came in version 0.8.0, more than a year after it was first launched. To achieve this, a data access layer was implemented.
From a design standpoint, having a data access layer allows an application to have its core logic and data store logic to be loosely coupled. This gives the flexibility to use different data storage solutions while retaining a common data model. Most modern web frameworks will have this concept baked in for a developer to follow. A good example is Hibernate, an ORM framework designed to allow working with different SQL databases using the same data model. Hibernate is used on top of other frameworks, like Spring. Hibernate works because it allows developers to work with objects without caring too much about the underlying SQL, and the necessary optimizations for a specific database like MySQL or PostgreSQL.
Kong’s data access layer, on the surface, follow this same principle. It allows developers to work with the same data model regardless of whether they are using Cassandra or PostgreSQL. While similar in concept to Hibernate, they are fundamentally different due to one thing: the data store type they manage. Hibernate’s focus is to abstract Relational Databases only (while taking into account the various intricacies of each database), while Kong’s focus is to simply do a 1:1 mapping between a Relational and NoSQL database.
Directly mapping between a SQL and NoSQL data store is a no-no. The two data store types have different use cases, and performance characteristics. Regardless of how similar they are.
Looks like a duck. Quacks like a duck. IT’S A BUS!
Cassandra, being a wide column store, shares some structural similarities to an SQL database. You can practically get an existing table from almost any Relational Database, with all of its indices (minus the foreign keys), and represent it in Cassandra in an almost 1:1 manner. The reverse applies.
In fact, you could even say that PostgreSQL and Cassandra are distant relatives, due to the similarities in supported data types that you would not normally see in other Relational Databases (e.g. both support JSON and arrays as column types).
Unfortunately, the similarities end here.
Our issues with Kong and Cassandra stem from how the data was modeled, and how the indices are used today.
You see, when you declare an index in a Relational Database, there is really no difference whether you index a primary key, a regular column, or a unique key. All of those will use the indexing algorithm you specify. In Cassandra, an index on a non-primary column is called a secondary index. Functionally, they act almost the same as a Relational Database index. However, when you query by a secondary index (without specifying the primary key), it will query all nodes to look for the data. This is not a problem for indexes with low cardinality, but for a table with high cardinality (e.g. the
oauth2_tokens table), trying to look for an access token without using the primary key puts unnecessary strain on the entire cluster.
There have been several write ups about this. I recommend reading thru Marcus Cavalcanti’s horror story.
The Root Cause and Fix
In the end, because Kong performs an almost 1:1 query mapping between Cassandra and PostgreSQL, the generated queries are suitable and highly performant for a Relational Database, but is not suited for Cassandra.
Once we figured this out, we had 3 options:
- Denormalize the Kong tables – Cassandra works best with a denormalized schema, and this would allow us to structure the data in such a way that we avoid the secondary index problem altogether. However, we would need to perform a significant overhaul of the common data structures. This means we have to fork the project.
- Write a custom OAuth2 plugin optimized for Cassandra – this would allow us to isolate the needed changes to OAuth2 only. However, simply writing or forking of the existing OAuth2 plugin will require us to duplicate core components and data structures, which will increase overall complexity and testing (and time is not on our side).
- Switch to PostgreSQL – Since the schema for Cassandra and PostgreSQL are 99% match (minus a few fields), we can migrate the data using a custom script, and have it fully functional within a few minutes.
I wanted to go with option 1. But the clock was ticking for all of us. Option 3 became the obvious choice.
My team built a tool for migrating data from one Cassandra cluster to another. We modified that to migrate data to PostgreSQL. In just a couple of hours, we were able to migrate the data (except the tokens), and reconfigure Kong to use our PostgreSQL cluster without any fuss.
The results speak for themselves.
Best of all, most of the problems went away. We are still seeing issues at P99, which usually translates to 1 out of 10000 requests. We have been able to reduce this further to almost 0, but that is another article for another day.
The move to PostgreSQL meant that we can reduce infrastructure costs significantly. From 40 servers down to 2, with far better performance and reliability.
There are two key items to note here.
First, when designing systems, using abstraction layers to hide the complexities of dealing with the lower layers can be a good thing, provided that it takes into account the complexities to begin with. Hibernate, for example, deals with the complexities of the underlying database by requiring the use of an appropriate dialect. Simply building an abstraction for the sake of it would often times result in unforeseen consequences.
Second, when building products, simplicity is king. If you don’t need to do it, don’t. If there is peer or market pressure, but it doesn’t make sense, don’t. The case presented here could have been avoided if Kong stuck to its guns and optimized itself purely for Cassandra. Yes, running Cassandra is no walk in the park. But with today’s advancements, and a good selection of managed services like Instaclustr, Datastax, or AWS, that concern becomes moot.