Azure Cosmos DB: Your partitioning strategy can save you money!
This article is part of a series on the topic of Cloud Cost Optimization.
While Cloud Cost Optimization requires robust monitoring, there exist some best practices that we will share with you throughout this series.
As you certainly know, the billing unit for Azure Cosmos DB is the Requests Units per second (RU/s). In other words, we are billed for the number of Requests Units per second that we consume. Hence, to optimize costs, you will want to perform the required operations on the database while consuming the minimum amount of RU/s.
While what we will see in this article is not a substitution to applying other optimization mechanisms (e.g., caching, compression), it aims at optimizing the costs of executing your queries on Azure Cosmos DB.
When it comes to the number of consumed RU/s, we need to be aware of two things:
- Each type of operation (Read, Insert, Upsert, Delete and Query) consumes a different amount of RU/s.
- A key factor that impacts that number of RU/s is the complexity of your query. For example, if your query involves obtaining data from multiple containers (collections, tables, graphs, and so on), then you can expect your query to be expensive in terms of RU/s that it consumes.
We can rely on the Azure Cosmos DB query statistics that is accessible through the Azure portal to evaluate how many RU/s our queries consume. If we note that our most-used queries are too expensive in terms of RU/s, we will need to redesign our data schema and partitioning strategy.
Here is what the query statistics window looks like:
Fig 1. Azure Cosmos DB Query statistics
Even though designing the right partitioning strategy really depends on your needs and use cases, there are some common examples:
You are building a shopping cart; the user ID could be an appropriate partition key.
You are building an online game; the game ID could be an appropriate partition key.
You are building an IoT system; the device ID could be an appropriate partition key.
In a more general manner: do not design your data schema (and your partitioning strategy) based on relationships between your data (like you used to do in a relational database system). Instead, do it in terms of usage of these data.
The most expensive design (when it comes to RU/s consumed) is the one that requires you to perform some kind of joins across multiples containers to retrieve the data you want.
And there you have it! You can now add Proper partitioning Strategy to your Cost Optimization Strategy. Just keep in mind that to define an appropriate partitioning strategy for Azure Cosmos DB, you must not think in a relational manner.