YugaByte review: Planet-scale Cassandra and Redis
YugaByte DB combines distributed ACID transactions, multiregion deployment, and support for Cassandra and Redis APIs, with PostgreSQL on the way
-
YugaByte DB 1.0
During my decades as a database application developer, I never imagined in my wildest dreams that I would ever have access to a transactional, planet-scale, distributed database, much less that I would be comparing many of them. But with Google Cloud Spanner, CockroachDB, Azure Cosmos DB, Neo4j Enterprise, and most recently YugaByte DB all available in production, that one-time pipe dream is now quite real.
In broad terms, Google Cloud Spanner offers a scalable, distributed, strongly consistent SQL database as a service that can handle around 2,000 writes per second and 10,000 reads per second, per node, with a median latency of about five milliseconds. To speed up reads that don’t need absolutely up-to-date data, you can ask Spanner for stale reads, since it supports time-travel queries. Spanner uses a Google dialect of SQL and runs only on the Google Cloud Platform.
CockroachDB is a Spanner-like, open-source SQL database that supports the PostgreSQL wire protocol and PostgreSQL SQL dialect. CockroachDB is built on top of RocksDB, an open-source transactional and consistent key-value store. Like Spanner, it supports time-travel queries. CockroachDB can run on any cloud, in Docker containers with or without orchestration, or on Linux servers or VMs. The enterprise version of CockroachDB adds geo-partitioning, role-based access control, and support.
Azure Cosmos DB is a globally distributed, horizontally partitioned, multimodel database as a service. It offers four data models (key-value, column family, document, and graph) and five tunable consistency levels (strong, bounded staleness, session, consistent prefix, and eventual). It offers five API sets: SQL (dialect), MongoDB-compatible, Azure Table-compatible, graph (Gremlin), and Apache Cassandra-compatible. It only runs on the Microsoft Azure cloud.
Neo4j is a scalable and survivable graph database that uses the Cypher query language. You can install its open-source, non-clustered version on Windows, MacOS, and Linux, in Docker containers, and in VMs. Neo4j Enterprise supports high availability and causal clusters; causal clusters allow for asynchronously updated clusters of read replicas, to allow high performance for geographically distributed deployments.
Enter Yugabyte DB
YugaByte DB, the subject of this review, is an open-source, transactional, high-performance database for planet-scale applications that supports three API sets: YCQL, compatible with Apache Cassandra Query Language (CQL); YEDIS, compatible with Redis; and PostgreSQL (currently incomplete and in beta). YugaWare is the orchestration layer for YugaByte DB Enterprise Edition. YugaWare makes quick work of spinning up and tearing down distributed clusters on Amazon Web Services, Google Cloud Platform, and (due Q4 2018) Microsoft Azure. YugaByte DB implements multiversion concurrency control (MVCC), but doesn’t yet support time-travel queries.
YugaByte DB is built on top of an enhanced fork of the RocksDB key-value store. YugaByte DB 1.0 shipped in May 2018.
Two of the key technologies used to make distributed transactional databases consistent and fast are cluster consensus algorithms and node clock synchronization. Google Cloud Spanner and Azure Cosmos DB both use the Paxos consensus algorithm proposed by Leslie Lamport. CockroachDB and YugaByte DB use the Raft consensus algorithm proposed by Diego Ongaro and John Ousterhout.
Google Cloud Spanner uses Google’s proprietary TrueTime API, based on GPS and atomic clocks. Azure Cosmos DB, CockroachDB, and YugaByte DB use hybrid logical clock (HLC) timestamps and Network Time Protocol (NTP) clock synchronization.
YugaByte design goals
The founders of YugaByte—Kannan Muthukkaruppan, Karthik Ranganathan, and Mikhail Bautin—were Apache HBase committers, early engineers on Apache Cassandra, and the builders of Facebook’s NoSQL platform (powered by Apache HBase). Their goal for YugaByte DB was a distributed database server philosophically in-between Azure Cosmos DB and Google Cloud Spanner; that is, they wanted to combine the multimodel and high-performance attributes of Cosmos DB with the ACID transactions and global consistency of Spanner. Another way of describing their goal is that they wanted YugaByte DB to be transactional, high-performance, and planet-scale, all at once.
They broke the process into five steps, each of which took about six months to build. The first step was to create a strongly consistent version of RocksDB, a high-performance key-value store written in C++, by adding the Raft consensus protocol, sharding, and load balancing, and removing transaction logging, point-in-time backups, and recovery, which needed to be implemented in a higher layer.
The next step was to build a log-structured, key-to-document storage engine, adding non-primitive and nested types, such as rows, maps, collections, and JSON. Then they added a pluggable API layer, like Azure Cosmos DB, implementing Cassandra-compatible and Redis-compatible APIs, and deferring a PostgreSQL-compatible SQL API to a later stage. Then came extended query languages.
YugaByte Cloud Query Language (YCQL) extends the Cassandra API with support for distributed transactions, strongly consistent secondary indexes, and JSON. YugaByte Dictionary Service (YEDIS) is a Redis-compatible API with additions of built-in persistence, auto-sharding, and linear scalability. YEDIS optionally allows for timeline-consistent, low-latency reads from the nearest data center, while strong write operations maintain global consistency. YEDIS also includes a new time series data type.
Finally, with version 1.0, YugaByte DB Enterprise adds a layer to orchestrate, secure, and monitor production-grade deployments across multiple regions and multiple clouds, and stores distributed backups to a configurable endpoint such as Amazon S3. PostgreSQL support remains incomplete and at a beta-test level.
Distributed ACID transactions
At the risk of completely oversimplifying the process, let me try to summarize the way YugaByte performs distributed ACID transactions. ACID (which stands for atomicity, consistency, isolation, and durability ) used to be considered a property confined to SQL databases.
Suppose you submit a YCQL query containing updates inside a transaction, for example a paired debit and credit that both must aborted if one fails in order to maintain the consistency of a financial database. YugaByte DB accepts the transaction in a stateless transaction manager, one of which runs on every node in the cluster. The transaction manager then tries to schedule the transaction on the tablet server that owns most of the data accessed by the transaction, for performance purposes.
The transaction manager adds a transaction entry with a unique ID into the transaction status table. Then it writes provisional records to all the tablets responsible for the keys the transaction is trying to modify. If there are conflicts, one of the conflicting transactions is rolled back.
Once all of the provisional records have been written successfully, the transaction manager asks the transaction status tablet to replace all provisional records with regular records using the timestamp of the “transaction committed” entry in its Raft log. Finally, the transaction status tablet sends cleanup requests to each of the tablets that participated in the transaction.
To improve performance, YugaByte aggressively caches information for transactions in progress, implements fine-grained locks, and uses hybrid time leader leases to prevent clients from reading stale values from old leaders. Single-row ACID transactions are optimized to have low latencies when there is no conflicting operation. Distributed ACID transactions preserve correctness at the expense of higher latencies.
YCQL, YEDIS, and PostgreSQL
YugaByte includes a nearly complete implementation of CQL, plus some extensions. One huge improvement over Cassandra is that YugaByte is strongly consistent, while Cassandra is eventually consistent. The other enhancements are for distributed transactions, strongly consistent secondary indexes, and JSON. YugaByte outperforms Cassandra for every operation except for short range scans, at least partially because of its strong consistency, which allows for a single read instead of the quorum read needed in Cassandra.
Cassandra supports four primitive data types not yet supported in YugaByte: date, time, tuple, and varint. YugaByte also has some restrictions on expressions.
YugaByte’s implementation of Redis lacks the list data type, but adds a time series data type. It adds built-in persistence, auto-sharding, and linear scalability as well as the ability to read from the nearest data center for low latency.
YugaByte’s PostgreSQL implementation is not very far along. Right now it lacks UPDATE and DELETE statements, expressions, and the SELECT statement lacks a join clause.
YugaByte installation and testing
You can install the open-source YugaByte DB from source code, from tarballs on MacOS, Centos 7, and Ubuntu 16.04 or later, and from Docker images on Docker or Kubernetes. You can then create clusters and test the three query APIs and some sample workload generators.
I chose to install YugaByte DB Enterprise on the Google Cloud Platform. While there were more manual steps to take than I would have liked, I was able to go through my installation and tests in a single afternoon after I had my Enterprise Edition license key.
Once the YugaWare instance was running on a four-CPU instance in Google Cloud, I configured Google Cloud Platform as the cloud provider for my database cluster.
Then I created a three-node cluster of eight-CPU instances in the US-East region.
I ran load tests using both the CQL and Redis APIs.
I was able to query both the CQL and Redis data from the command line.
I also created a three-node cluster in different regions spread around the world (below). This took longer to create (about 45 minutes) and had much higher write latency, as expected. You can’t get around the speed of light, unfortunately.
YugaByte costs
The price of a three-node YugaByte DB Enterprise Edition license starts at $40K per year. In addition to that, you need to factor in the cost of the servers. For a three-node cluster on Google Cloud Platform using eight-CPU VM instances, that cost is in the range of $800 to $900 a month plus network traffic, perhaps $11K a year.
My own costs for an afternoon of testing were $0.38 for the instances, and $0.01 for inter-zone egress. Deleting database clusters from the YugaByte DB Enterprise interface was easy, and once I stopped the VM instance running the administration and orchestration interface it no longer accrued significant charges.
Faster, better, distributed
Overall, YugaByte DB performed as advertised. At this point in its development it is useful as a faster, better, distributed Redis and Cassandra. It should eventually also be a better PostgreSQL, although in my experience that takes a long time (years rather than months), especially when you get to the point of trying to tune relational joins.
YugaByte DB doesn’t yet compete with Google Cloud Spanner, CockroachDB, or the SQL interface to Azure Cosmos DB for lack of a fleshed-out SQL interface. It doesn’t yet compete with Neo4j or the graph interface to Cosmos DB for lack of graph database support. It does compete with Redis, Cassandra, and the Cassandra-compatible interface to Cosmos DB.
Should you try out YugaByte DB yourself? If you need a distributed version of Redis or Cassandra, or you need to replace MongoDB for a globally distributed scenario, then yes. YugaByte DB could also be used to standardize on a single database for multiple purposes, such as combining a Cassandra database with Redis caching, as YugaByte customer Narvar has done. YugaByte DB also adds high-performance secondary indexes and a JSON type to Cassandra, increasing its utility as a transactional database.
Whether you want the open-source or enterprise version of YugaByte DB depends on your budget. By and large, if you’re a startup, you probably want the open source version. If you’re an established global company with many transactional database applications, especially if you need to scale clusters up and down often, you might benefit from the additional features in the enterprise version.
Copyright © 2018 IDG Communications, Inc.