Keeping track of the order in which events take place is crucial for many systems. For instance, in a SQL database, the write and commit times must be compared to the read time to ensure consistency and ACID properties. In a monolithic system, a single clock can be used to simplify timekeeping for all processes. Even if it diverges from real-time, it still maintains the order of events.
However, using a single clock in distributed systems would limit scalability. There is no perfect clock and with multiple ones, each can drift. The system can resynchronize the clocks via network messages, but this is not instantaneous or homogeneous, especially when deploying to multiple Availability Zones or regions.
Josh Levinson and Julien Ridoux presented Amazon Time Sync improvements during a Chalk Talk at re:Invent 2023. I use some screenshots of the whiteboard to illustrate this blog post.
When synchronizing your clock with a time server, the time you receive can be from any point between when you send and receive:
This uncertainty includes the network latency as well as run queue time. When compared to unsynchronized clocks, the clock skew is bounded. Typically, with a public NTP (Network Time Protocol) server like
time.aws.com the clock skew can be in hundreds of milliseconds.
With PTP (Precision Time Protocol), used locally by Amazon Time Sync Service, the skew is reduced under the millisecond. However, the synchronized time can still be far from real clock time. Amazon Time Sync Service can now provide accurate time, represented as the target's bullseye:
This involves integration of hardware components, including GPS and atomic clocks, with Nitro virtualization. The EC2 instance can access these resources transparently through NTP or PTP Hardware Clock.
The hardware deployment is done by regions, and currently, it is available only in
ap-northeast-1. The deployment process is also done in recent instance types, currently only
r7g. The best part is that there are no additional charges involved. It is transparently synchronized through NTP when using chrony and can also be configured to use PHC.
It is expected that all distributed databases will benefit from the new feature. Amazon Aurora Limitless, the Sharded Database managed service on RDS, will also be able to use this feature. I'm excited to test this feature with YugabyteDB, the Distributed SQL database, where I can reduce the
max_clock_skew and eliminate the
kReadRestart errors. With this feature, external consistency can also be turned on, similar to Spanner, without a visible effect on write waiting times (as they will only take 50 microseconds).