The best NoSQL databases

Highly flexible and hugely scalable, NoSQL databases offer a range of data models and consistency options to suit your application

1 2 Page 2
Page 2 of 2

The Bigtable paper inspired several open source NoSQL databases including Apache HBase, Apache Cassandra, and Apache Accumulo. Bigtable uses a highly scalable, sparsely populated table structure, where each table is a sorted key-value map. A Bigtable row describes a single entity and is indexed by a single row key; a column contains individual values for each row. Column families group related columns. Each row/column intersection can contain multiple cells at different timestamps, and cells without data take no space.

Cloud Bigtable delivers very high performance under high load, even compared to other NoSQL services. Part of that flows from the inherently efficient design, and part of that comes from the fast, scalable infrastructure. Along with high performance, Bigtable exhibits very low latency.

Read my review of Google Cloud Bigtable.

MongoDB

MongoDB is highly scalable, operational document database available in both open source and commercial enterprise versions, and it can be run on-premises or as a managed cloud service. The managed cloud service is called MongoDB Atlas.

MongoDB is far and away the most popular of the NoSQL databases. Its document data model gives developers great flexibility, while its distributed architecture allows for great scalability. As a result, MongoDB is often chosen for applications that must manage large volumes of data, that benefit from horizontal scalability, and that handle data structures that don’t fit the relational model.

MongoDB is a document-based store that also has a graph-based store implemented on top of it. MongoDB doesn’t actually store JSON: it stores BSON (Binary JSON), which extends the JSON representation (strings) to include additional types such as int, long, date, floating point, decimal128, and geospatial coordinates.

MongoDB can generate multi-modal graph, geospatial, B-tree, and full text indexes on a single copy of the data, using the type of the data to generate the correct type of index. MongoDB lets you create indexes on any document field. MongoDB 4 has multi-document transactions, which means that you can still get ACID properties even if you have to normalize your data design.

By default, MongoDB uses dynamic schemas, sometimes called schema-less. The documents in a single collection do not need to have the same set of fields, and the data type for a field can differ across documents within a collection. You can change document structures with dynamic schemas at any time.

Schema governance is available, however. Starting in MongoDB 3.6, MongoDB supports JSON schema validation, which you can turn on in your validator expression.

Read my review of MongoDB.

Redis

Redis is an open source, in-memory data structure store, used as a database, cache, and message broker. It supports data structures such as strings, hashes, lists, sets, sorted sets with range queries, bitmaps, hyperloglogs, geospatial indexes with radius queries, and streams. Redis has built-in replication, Lua scripting, LRU eviction, transactions, and different levels of on-disk persistence. Redis provides high availability via Redis Sentinel and automatic partitioning with Redis Cluster.

Redis Enterprise is a fully durable multi-model database. It supports key-value, document, graph and time series data, probabilistic data structures, comprehensive search, stream processing, and serving deep learning and AI models.

Yandex ClickHouse

Yandex ClickHouse is an open-source, column-oriented OLAP database management system that manages extremely large volumes of data, including non-aggregated data, in a stable and sustainable manner, and allows generating custom data reports online in real time. The system is linearly scalable and can be scaled up to store and process trillions of rows and petabytes of data.

ClickHouse is designed to work on regular hard drives, which means the cost per GB of data storage is low, but SSD and additional RAM are also fully used if available. (By contrast, SAP HANA can only work in RAM.) ClickHouse does parallel processing on multiple cores.

In ClickHouse, data can reside on different shards. Each shard can be a group of replicas that are used for fault tolerance. The query is processed on all the shards in parallel.

ClickHouse supports a declarative query language based on SQL that is identical to the SQL standard in many cases. Supported queries include GROUP BY, ORDER BY, subqueries in FROM, IN, and JOIN clauses, and scalar subqueries. Dependent subqueries and window functions are not supported.

Although ClickHouse does support data inserts and mutations, it was not designed for OLTP. Yandex recommends inserting data in packets of at least 1,000 rows, or no more than a single request per second. No locks are taken when new data is ingested.

ClickHouse uses asynchronous multi-master replication. After being written to any available replica, data is distributed to all the remaining replicas in the background.

ClickHouse was developed to support Yandex.Metrica, the second largest web analytics platform in the world. This application currently uses 394 servers located in six geographically distributed data centers, handling more than 13 trillion records in the database and more than 20 billion events daily.

YugaByte

YugaByte DB is an open-source, transactional, high-performance database for planet-scale applications that supports three API sets: YCQL, compatible with Apache Cassandra Query Language (CQL); YEDIS, compatible with Redis; and PostgreSQL.

YugaWare is the orchestration layer for YugaByte DB Enterprise Edition. YugaWare makes quick work of spinning up and tearing down distributed clusters on Amazon Web Services, Google Cloud Platform, and Microsoft Azure. YugaByte DB implements multi-version concurrency control (MVCC), which it uses for non-locking reads.

YugaByte Enterprise supports read replicas, multi-cloud clusters, and comprehensive monitoring and alerting without any configuration. It also features in-flight and at-rest encryption, one-click distributed backups and restores for clusters of any size, and auto-tiering of cold data to cheaper storage.

Read my review of YugaByte DB.

Copyright © 2019 IDG Communications, Inc.

1 2 Page 2
Page 2 of 2