Challenge: Using Time Travel and Cloning in Snowflake
This week’s focus centers around leveraging Snowflake’s Time Travel and Cloning features for data management and recovery. Imagine you are a data engineer at a company where data integrity is crucial for day-to-day operations. A series of unexpected changes have been made to the database, and it’s your job to ensure the data is accurate and consistent.
Story Scenario:
Your company, DataCorp, recently encountered several incidents that impacted the integrity of the data in the sales
database. Over the past week, multiple updates and deletions were performed on the transactions
table. Your mission is to use Snowflake’s Time Travel and Cloning features to recover and verify the data at different points in time and ensure the smooth operation of your data processes.
As you engage with this challenge, consider the following tasks:
- Initial Data Incident:
- Task: Restore the
transactions
table to its state from three days ago, before the first series of unintended updates occurred. Verify the data integrity by comparing it to a backup file and correct any discrepancies.
- Task: Restore the
- Subsequent Data Incident:
- Task: Identify and restore the
transactions
table to its state from yesterday morning, just before a significant row deletion was performed. Verify the number of rows and specific key transactions to ensure accuracy.
- Task: Identify and restore the
- Recent Data Correction:
- Task: Roll back the
transactions
table to its state from six hours ago to undo a batch process that accidentally duplicated records. Ensure that no duplicate records exist after the rollback.
- Task: Roll back the
- Creating a Safe Testing Environment:
- Task: Create a clone of the entire
sales
database from its state a week ago. This clone will be used for testing new features without affecting the production database. Verify that the cloned database includes all tables and data as of the specific point in time.
- Task: Create a clone of the entire
Start-up Code
CREATE TABLE sales.transactions ( id INT, customer_name STRING, amount DECIMAL(10,2), transaction_date TIMESTAMP );
INSERT INTO sales.transactions (id, customer_name, amount, transaction_date) VALUES (1, 'Alice', 100.00, '2024-07-20 10:00:00'), (2, 'Bob', 200.00, '2024-07-20 11:00:00'), (3, 'Charlie', 300.00, '2024-07-20 12:00:00');
Below is the code to simulate our problems:
Simulate issues
#Issue 1 (three days ago):
UPDATE sales.transactions SET amount = amount * 1.1 WHERE id = 1;
DELETE FROM sales.transactions WHERE id = 2;
#Issue 2 (Yesterday) :
DELETE FROM sales.transactions WHERE id = 3;
#Issue 3 (6 hours ago):
INSERT INTO sales.transactions (id, customer_name, amount, transaction_date)
SELECT id, customer_name, amount, transaction_date FROM sales.transactions;
Remember if you want to participate:
- Sign up as a member of Frosty Friday. You can do this by clicking on the sidebar, and then going to ‘REGISTER‘ (note joining our mailing list does not give you a Frosty Friday account)
- Post your code to GitHub and make it publicly available (Check out our guide if you don’t know how to here)
- Post the URL in the comments of the challenge.