Snowflake guest challenge!
For this week’s challenge, we’re very happy to announce a guest challenge designed and written by Snowflake’s own Daniel Myers! We’ve given him carte blanche to pick up any feature or subject that he could think of and, to be honest, we’re very happy he picked this one!
Don’t be put off thinking ‘but I’ve never done data science before’, use this as an opportunity to get that experience in!
Not sure where to begin? Check the resources underneath the Help / Hints / Tips section
Frost Fridays have been incredibly awesome!
I’ve put together a proposed challenge that uses snowpark and data from our data marketplace to predict inflation for 2024 in the US!
Inflation is rising! Today’s challenge is to predict future inflation using actual economic data from the Snowflake Data Marketplace. Specifically, you will train a linear regression model inside snowflake that predicts the personal consumption expenditures (PCE) for any given future year! To answer this challenge, provide the PCE for the year 2024.
The data source: https://www.snowflake.com/datasets/knoema-economy-data-atlas/
Help / Hints / Tips
To view historical personal consumption data:
SELECT "Date", "Value" FROM "ECONOMY"."BEANIPA" WHERE "Table Name" = 'Price Indexes For Personal Consumption Expenditures By Major Type Of Product' AND "Indicator Name" = 'Personal consumption expenditures (PCE)' AND "Frequency" = 'A' AND "Date" >= '1972-01-01' ORDER BY "Date"
Helpful quickstart (answers): https://quickstarts.snowflake.com/guide/data_apps_summit_lab/
This time, we’re not showing you the entire answer but a value from a previous year so that you’ve got a hint if you’re on the right track :
Shout out and thanks
We want to again thank Daniel for writing this challenge for us!
You can find and follow Daniel on LinkedIn and on Twitter
Remember if you want to participate:
- Sign up as a member of Frosty Friday. You can do this by clicking on the sidebar, and then going to ‘REGISTER‘ (note joining our mailing list does not give you a Frosty Friday account)
- Post your code to GitHub and make it publicly available (Check out our guide if you don’t know how to here)
- Post the URL in the comments of the challenge.
The challenge expects a 2021 value of 116.23 for this prediction, which I am not seeing. I have tried filtering to 1972 onwards like in the suggested quickstart but then I get a value of 116.18
I have attempted several different filters for the input data and none reach the desired 116.23 so I think I must be missing a specific input.
After spending far too much time trying to figure out that value and failing, I thought I’d make up for it by converting the process into a UDF. So my solution includes a notebook to target a specific value (similar to the suggested quickstart) and a sql script that generates a UDTF to output a set number of predictions based on the input table.
If anybody can figure out where I’ve gone wrong to not get the 116.23 for 2021, please let me know!
I have slightly modified my answer after some investigation and believe the input data has changed. I have adjusted my input data to also remove 2021, so the years are 1972 <= year < 2021
The challenge expects a 2021 value of 116.23 for this prediction, whilst I now see a value of 116.22.
Comparing the results and values with the original Snowflake Quick Starts code
it appears the original data itself has changed in very small volumes.
For example, the actual value for 2019 is now 109.933 when it used to be 109.922
I believe this means my solution is correct and the input data itself has simply changed since the challenge was encoded.
I’ve spent too much time trying to make it work all contained as part of the dbt project…without success. Training the model locally within dbt seems…unattainable at the moment.
I’ve created a python notebook instead that creates the udf.
Similarly to @ChrisHastie above, my numbers do not quite match. But I did ran some exploration and decided to set an arbitrary cutoff to start in 1980, so there’s that…