DEV Community

Cover image for Iceberg Time Travel & Rollbacks in Trino
Tom Nats for Starburst

Posted on • Originally published at starburst.io

Iceberg Time Travel & Rollbacks in Trino

I originally posted this on Starburst's blog, as part of a series I've been publishing on Iceberg.

Time Travel

Time travel in Trino using Iceberg is a handy feature to “look back in time” at a table’s history. As we covered in this blog, each change to an Iceberg table creates a new “snapshot” which can be referred to by using standard SQL.

As you can see from the diagram below, a new snapshot is created for the table creation, insert and update.

Snapshot example

To see the snapshots on a table, you can use the handy metadata table that exists for each table:

SELECT * FROM "customer_iceberg$snapshots";

Snapshot files returned from customer table

The above snapshot table shows the create, insert and update operations on the customer_iceberg table. You can see what type of operation was performed and when it was executed.

To select a certain snapshot, you use the “for version as of” syntax. In the following two examples, we show the customer name before and after an update:

SELECT custkey,name
FROM customer_iceberg FOR VERSION AS OF 5043425904354141100 where custkey = 2732;

Snapshot before update to custkey 2732

SELECT custkey,name
FROM customer_iceberg FOR VERSION AS OF 3117754680069542695 where custkey = 2732;

Snapshot after update to custkey 2732

You can also specify a timeframe to retrieve an older snapshot of a table. For example, the following query brings back the data for the first snapshot on or before a given timestamp:

SELECT custkey,name
FROM s3lakehouse.demo_tpch.customer_iceberg FOR TIMESTAMP AS OF TIMESTAMP '2022-09-18 07:18:09.002 America/New_York' where custkey = 2732;

Retrieving older snapshot of custkey

Rolling back

Back to the Future meme captioned "Let's go back in time to a previous Iceberg snapshot"

Another great feature of Iceberg is the ability to roll back a table to a previous snapshot. Sometimes this is used when a row was accidentally deleted or updated. As long as the snapshot exists, (it hasn’t been cleaned up yet) then you can roll back to any existing snapshot.

For example, in the scenario above, if I wanted to roll back to the state of the table before the update on the customer, then I would issue the following command:

CALL iceberg.system.rollback_to_snapshot('demo_tpch', 'customer_iceberg', 5043425904354141100)

Then we can query the table again to see the customer’s name was “rolled back” to the previous version before the update:

SELECT custkey,name
FROM s3lakehouse.demo_tpch.customer_iceberg where custkey = 2732;

Rolled back version of custkey

Both time travel and rolling back are database functions that are now available in a modern data lake architecture. This is a game changer as it allows database type functionality to objects stores which were not available until now.

Top comments (0)