This week we’ve got a bit of a deceptive problem that’s very easy to understand but tricky to execute in SQL.
You will find your setup code here
create or replace table testing_data(id int autoincrement start 1 increment 1, product string, stock_amount int,date_of_check date);
insert into testing_data (product,stock_amount,date_of_check) values ('Superhero capes',1,'2022-01-01');
insert into testing_data (product,stock_amount,date_of_check) values ('Superhero capes',2,'2022-01-02');
insert into testing_data (product,stock_amount,date_of_check) values ('Superhero capes',NULL,'2022-02-01');
insert into testing_data (product,stock_amount,date_of_check) values ('Superhero capes',NULL,'2022-03-01');
insert into testing_data (product,stock_amount,date_of_check) values ('Superhero masks',5,'2022-01-01');
insert into testing_data (product,stock_amount,date_of_check) values ('Superhero masks',NULL,'2022-02-13');
insert into testing_data (product,stock_amount,date_of_check) values ('Superhero pants',6,'2022-01-01');
insert into testing_data (product,stock_amount,date_of_check) values ('Superhero pants',NULL,'2022-01-01');
insert into testing_data (product,stock_amount,date_of_check) values ('Superhero pants',3,'2022-04-01');
insert into testing_data (product,stock_amount,date_of_check) values ('Superhero pants',2,'2022-07-01');
insert into testing_data (product,stock_amount,date_of_check) values ('Superhero pants',NULL,'2022-01-01');
insert into testing_data (product,stock_amount,date_of_check) values ('Superhero pants',3,'2022-05-01');
insert into testing_data (product,stock_amount,date_of_check) values ('Superhero pants',NULL,'2022-10-01');
insert into testing_data (product,stock_amount,date_of_check) values ('Superhero masks',10,'2022-11-01');
insert into testing_data (product,stock_amount,date_of_check) values ('Superhero masks',NULL,'2022-02-14');
insert into testing_data (product,stock_amount,date_of_check) values ('Superhero masks',NULL,'2022-02-15');
insert into testing_data (product,stock_amount,date_of_check) values ('Superhero masks',NULL,'2022-02-13');
Everyone has received data in the past that’s been perfectly legible for normal humans but useless for machines, a prime example of this is the following dataset from Superhero Inc. :
The inventory management has been a bit spotty with irregular checks on different dates, and an inventory system that can really use some TLC.
We can easily extrapolate that the stock amount hasn’t been filled out because it hasn’t changed and that the previous value that HAS been filled out still applies.
Translating this into SQL however, is your challenge for today.
We would like to transform the above column, into the following :
We secretly know that Snowflake has a way to do this a bit more easily in comparison to other systems so if you REALLY want to challenge yourself, you can try to do it without any Snowflake-specific tools.
Remember if you want to participate:
- Sign up as a member of Frosty Friday. You can do this by clicking on the sidebar, and then going to ‘REGISTER‘ (note joining our mailing list does not give you a Frosty Friday account)
- Post your code to GitHub and make it publicly available (Check out our guide if you don’t know how to here)
- Post the URL in the comments of the challenge.
You can also sign up to our mailing list below.
This reminds me of how much I appreciate Snowflake improvements 🙂
Solution URL – https://github.com/mateusz-kmon/frostyfridaychallenges/blob/main/w13.sql
I enjoyed that, but since I’ve spent most of my career in Snowflake so I’m not actually sure which of my two solutions leverage any special Snowflake things. Hopefully one of them counts!
I’m also not sure about my second answer. I’ve taken advantage of how the IDs don’t seem to have any impact on the table, since there’s never a case where two populated records for a product have the same date but different stock amounts.
I know how I’d tweak it if the IDs were different, but it’s not an issue with my first answer anyway ^^
This is my solution using the last_value() function, which I believe was invented for these types of problems. I see it frequently used to fill exchange rate tables: when an exchange rate is not available for a given day, then take the last known exchange rate.
I’m glad I didn’t give up after my horrendous brute-force-ish approach
Created the base table, then used lag for getting the previous value
https://github.com/dsmdavid/frostyfridays-sf/blob/main/models/challenge_13_02.sql
Learnt something new about the lag function being able to ignore null values so easily!
Tried the intermediate challenge as well and took a little while to get my head round it at first, but got there in the end.
Snowflake way was the easy way, but the first alternative approach, which I did in the same way as I would have done it in Oracle (using sub-queries), didn’t work in the same way, so I must admit that I checked other’s solutions for ideas and restructured my alternative solution so that it works…
Nice one!
My solution
Pretty straight forward with window function.