DEV Community

Tatsuro Shibamura
Tatsuro Shibamura

Posted on

Generate sequence numbers in Cosmos DB

Azure Cosmos DB does not provide a function to generate sequence numbers like RDB such as SQL Server in order to achieve very high availability and global distribution.

Since RDB can always achieve strong consistency, sequence numbers can be easily generated as described in the following document, but they can be a bottleneck in scaling.

Data modeling is the most important part of using Cosmos DB, and the key to maximizing the benefits of Cosmos DB is to design a system that does not require sequence numbers at that point (a design that allows for high data distribution).

However, there are rare cases in which sequence numbers are required even at the expense of availability.

While it was possible to safely generate sequence numbers in RDB, it used to be surprisingly difficult to safely generate sequence numbers using Cosmos DB until Partial Update became available.

Now that Cosmos DB supports partial updates and provides increments to arbitrary properties as operations, it is now surprisingly easy to generate sequence numbers.

The following is a sample code. The data model to be stored in Cosmos DB is very simple as follows. The value property holds the current sequence number.

public class Sequence
{
    [JsonProperty("id")]
    public string Id { get; set; }

    [JsonProperty("value")]
    public long Value { get; set; }
}
Enter fullscreen mode Exit fullscreen mode

The following JSON is stored in Cosmos DB in advance. Since partial update is a process to an already existing document, it is necessary to create the document itself separately.

{
    "id": "sample",
    "value": 0
}
Enter fullscreen mode Exit fullscreen mode

initial item

As you have probably figured out by now, all that is left is for the Cosmos DB SDK to perform a +1 operation on the value property using partial updates.

var connectionString = "<connection_string>";

var cosmosClient = new CosmosClient(connectionString);

var container = cosmosClient.GetContainer("my-database", "my-sequence");

var operations = new[]
{
    PatchOperation.Increment("/value", 1)
};

var response = await container.PatchItemAsync<Sequence>("sample", new PartitionKey("sample"), operations);

Console.WriteLine($"Seq = {response.Resource.Value}");
Enter fullscreen mode Exit fullscreen mode

Executing this code, you can see that the property increment is performed by Cosmos DB on the server side.

generate single

updated item

The difficulty with sequence numbers is whether they can be correctly generated without duplication when a large number of them are generated at the same time.

The sample code is extended as shown below to try parallel generation.

var tasks = new List<Task<ItemResponse<Sequence>>>();

for (var i = 0; i < 10; i++)
{
    tasks.Add(container.PatchItemAsync<Sequence>("sample", new PartitionKey("sample"), operations));
}

var response = await Task.WhenAll(tasks);

foreach (var itemResponse in response)
{
    Console.WriteLine($"Seq : {itemResponse.Resource.Value}, Consumed RU/s = {itemResponse.RequestCharge}");
}

Console.WriteLine($"Next Seq : {response.Max(x => x.Resource.Value) + 1}");
Enter fullscreen mode Exit fullscreen mode

When I run the extended code, it generates sequence numbers without duplication. Although requests are thrown at the same time, it can be confirmed that atomic processing is performed on the server side.

generate multiple

Previously, read and update had to be executed separately, which was complicated in Cosmos DB, where transactions are not available like in RDB. However, with partial updates, everything is done on the server side, including conflict resolution, so there is no need to worry about this.

Container for generating sequence numbers can reduce RU consumption slightly by turning off the index policy.

{
    "indexingMode": "none",
    "automatic": false,
    "includedPaths": [],
    "excludedPaths": []
}
Enter fullscreen mode Exit fullscreen mode

If you know that you do not need the indexes completely, you can reduce the cost of Cosmos DB by explicitly turning them off.

Top comments (0)