At the heart of Azure is a set of foundational services. They’re the technologies Microsoft uses to build its platform, giving it the tools to deliver reliable, scalable applications, and they’re the first to be deployed in any new Azure data center or region. Many of them never get to leave the background, but those that do are powerful, cloud-native, distributed-computing tools that can help you build and run massive applications that can run at global scale across regions, scaling to millions of users and petabytes of data.
One of these key services is Azure Cosmos DB, what Microsoft calls its “planet-scale” database. Built to support multiple database APIs and more practical consistency models than eventual or strong approaches, it’s a powerful tool that can work as a NoSQL document database, a SQL relational store, or a graph database. Recent changes have made it more affordable, too, offering a free tier for relatively simple applications with low demand, and a serverless option that lets you pay for resources as you use them. Although the free option limits your geographic reach, there’s no difference in capabilities between the different Cosmos DB options, allowing you to move from tier to tier as your data and scaling needs change.
Cosmos DB adds MongoDB 4.0 APIs for multidocument transactions
Microsoft continues to evolve Cosmos DB to support other aspects of its Azure platform, as well as new releases of the various APIs used to access the core service. At the March 2021 Ignite, it’s announcing a series of major upgrades to Cosmos DB, improving support for its Mongo DB APIs. At the same time, it’s using Cosmos DB’s internal change feed to introduce tools for real-time analytics with Azure Synapse, adding integrated continuous backups and tightening security with role-based access controls.
MongoDB remains a popular NoSQL document database, and Cosmos DB offers a compatible set of APIs that track MongoDB’s development. Currently it supports MongoDB 3.2 and 3.6, allowing you to quickly port existing applications from on-premises or self-hosted MongoDB instances to Cosmos DB. Data can be imported into Cosmos DB, and applications can use Cosmos DB endpoints without significant changes. Cosmos DB is only replicating the MongoDB wire protocol, so you will need to recreate any internal procedures as the Cosmos DB engine doesn’t host any MongoDB instances.
MongoDB updated its API to Version 4.0 in 2018, adding support for multidocument transactions. Microsoft has now updated Cosmos DB to work with the 4.0 API, adding the same multidocument features. Earlier versions of the API focused on working with a single JSON document at a time, with each operation a single atomic transaction. That approach is fine for simple applications, but in practice, larger-scale applications need to update or create multiple documents at the same time, much like a relational database working across multiple tables.
NoSQL databases are fast, and using JSON documents to, say, store product or customer data or host a shopping cart can speed up e-commerce applications. Any user transactions will need to span many documents, updating customer records, stock levels, and more. You could write code to wrap multiple operations, but that adds a bottleneck to your application, waiting for each transaction to complete before moving on. There’s an additional issue if you’re using Cosmos DB and taking this route to data operations, as each transaction will consume resource units, adding costs in pay-as-you-go instances or reducing your pool of available request units for prepaid instances.
By supporting MongoDB’s 4.0 API, Cosmos DB developers can now deliver those writes and updates in a single operation, simplifying the code you need to write without changing the underlying structure of your database. As most Cosmos DB developers keep common documents in a single shard, there’s little or no impact from working with multiple documents in a single transaction. You do need to be careful when working across shards, but that’s no different from any cross-shard operation where consistency can be an issue. Good Cosmos DB design practices will keep any risk to a minimum and should help deliver reliable and fast multidocument operations.
It’ll be interesting to see how Microsoft evolves its Cosmos DB MongoDB APIs in the future, as the 4.2 release added support for cross-shard distributed transactions, which should enable support for multidocument transactions in very large databases.
Adding Cosmos DB indexes to Azure Synapse
Modern databases, Cosmos DB included, are dependent on their internal logs. These provide the tools for replaying transactions or recreating a database in the event of errors. Cosmos DB’s internal change feed is more than a tool for managing database history, it’s key to supporting the many different consistency models that the database uses, giving each shard and each instance a common, time-stamped history of what has happened where and when.
The change feed is at the heart of two new major features in Cosmos DB, powering both Synapse Link and its Continuous Backup. With Azure’s increasing focus on analytics through its Synapse platform, adding support for Cosmos DB is a logical move. Microsoft has been working to make Synapse a way of delivering data lake–scale analytics without needing complex ETL pipelines that slow down imports. Instead, by providing links directly into data stores, it can help provide real-time analytics that can be displayed on Power BI dashboards. (Large Cosmos installations may need to take advantage of the Power BI’s new Premium G2 instances to process Azure Synapse data.)
Although Cosmos DB uses APIs to provide different personalities to different endpoints, it keeps the same internal document database model. This approach allows Azure Synapse to access Cosmos DB’s change feed to generate its own internal index. Keeping an index in Synapse speeds up queries, ensuring that your analytics get access to the latest data without requiring expensive and slow replication from Cosmos DB into a Synapse data lake. Using the change feed to create the index keeps the impact on Cosmos DB to a minimum, working in the background to export the feed and maintain it in columnar form in Synapse.
Using Synapse in this way logically separates your analytical and operational stores. Developers can continue to work with Cosmos DB APIs as they always have, while business analysts and data scientists can use tools such as Synapse Studio and Azure Data Explorer to build and test analytical queries before exporting them to visualization tools. There’s no need for anyone to give up their tools or learn new ways of working.
Using the change feed for continuous backup
Cosmos DB’s change feed is key to another new feature: continuous backup and point-in-time restore. With a log of every change to your database, you now have a way to recreate the database in the event of failure, restoring to a specific state based on time if you’ve identified a specific transaction as the cause of a failure or if it contains malicious data. Other new security features, including Azure Active Directory–based role-based access, should reduce unauthorized access to data. At the same time, Microsoft is supporting Cosmos DB in its Azure Purview data control tools.
Microsoft continues to add new features to Cosmos DB, underpinning its importance to Azure. Tracking MongoDB API features ensures applications can move from on-premises to the cloud easily, scaling as needed. Support for links to Azure Synapse breaks down the barriers between operational and analytical data, ensuring that both Cosmos DB’s developer and data science audiences get the most from their data.