I originally posted this on Starburst's blog, as part of a series I've been publishing on Iceberg.
Time Travel
Time travel in Trino using Iceberg is a handy feature to “look back in time” at a table’s history. As we covered in this blog, each change to an Iceberg table creates a new “snapshot” which can be referred to by using standard SQL.
As you can see from the diagram below, a new snapshot is created for the table creation, insert and update.
To see the snapshots on a table, you can use the handy metadata table that exists for each table:
SELECT * FROM "customer_iceberg$snapshots";
The above snapshot table shows the create, insert and update operations on the customer_iceberg table. You can see what type of operation was performed and when it was executed.
To select a certain snapshot, you use the “for version as of” syntax. In the following two examples, we show the customer name before and after an update:
SELECT custkey,name
FROM customer_iceberg FOR VERSION AS OF 5043425904354141100 where custkey = 2732;
SELECT custkey,name
FROM customer_iceberg FOR VERSION AS OF 3117754680069542695 where custkey = 2732;
You can also specify a timeframe to retrieve an older snapshot of a table. For example, the following query brings back the data for the first snapshot on or before a given timestamp:
SELECT custkey,name
FROM s3lakehouse.demo_tpch.customer_iceberg FOR TIMESTAMP AS OF TIMESTAMP '2022-09-18 07:18:09.002 America/New_York' where custkey = 2732;
Rolling back
Another great feature of Iceberg is the ability to roll back a table to a previous snapshot. Sometimes this is used when a row was accidentally deleted or updated. As long as the snapshot exists, (it hasn’t been cleaned up yet) then you can roll back to any existing snapshot.
For example, in the scenario above, if I wanted to roll back to the state of the table before the update on the customer, then I would issue the following command:
CALL iceberg.system.rollback_to_snapshot('demo_tpch', 'customer_iceberg', 5043425904354141100)
Then we can query the table again to see the customer’s name was “rolled back” to the previous version before the update:
SELECT custkey,name
FROM s3lakehouse.demo_tpch.customer_iceberg where custkey = 2732;
Both time travel and rolling back are database functions that are now available in a modern data lake architecture. This is a game changer as it allows database type functionality to objects stores which were not available until now.
Top comments (0)