Snowflake guest challenge!
For this week’s challenge, we’re very happy to announce a guest challenge designed and written by Snowflake’s own Daniel Myers! We’ve given him carte blanche to pick up any feature or subject that he could think of and, to be honest, we’re very happy he picked this one!
Don’t be put off thinking ‘but I’ve never done data science before’, use this as an opportunity to get that experience in!
Not sure where to begin? Check the resources underneath the Help / Hints / Tips section
Hey folks!
Frost Fridays have been incredibly awesome!
I’ve put together a proposed challenge that uses snowpark and data from our data marketplace to predict inflation for 2024 in the US!
The Challenge
Inflation is rising! Today’s challenge is to predict future inflation using actual economic data from the Snowflake Data Marketplace. Specifically, you will train a linear regression model inside snowflake that predicts the personal consumption expenditures (PCE) for any given future year! To answer this challenge, provide the PCE for the year 2024.
The Data
The data source: https://www.snowflake.com/datasets/knoema-economy-data-atlas/
Help / Hints / Tips
To view historical personal consumption data:
SELECT "Date", "Value" FROM "ECONOMY"."BEANIPA"
WHERE "Table Name" = 'Price Indexes For Personal Consumption Expenditures By Major Type Of Product'
AND "Indicator Name" = 'Personal consumption expenditures (PCE)'
AND "Frequency" = 'A'
AND "Date" >= '1972-01-01'
ORDER BY "Date"
Helpful quickstart (answers): https://quickstarts.snowflake.com/guide/data_apps_summit_lab/
Expected Answer
This time, we’re not showing you the entire answer but a value from a previous year so that you’ve got a hint if you’re on the right track :
Shout out and thanks
We want to again thank Daniel for writing this challenge for us!
You can find and follow Daniel on LinkedIn and on Twitter
Remember if you want to participate:
- Sign up as a member of Frosty Friday. You can do this by clicking on the sidebar, and then going to ‘REGISTER‘ (note joining our mailing list does not give you a Frosty Friday account)
- Post your code to GitHub and make it publicly available (Check out our guide if you don’t know how to here)
- Post the URL in the comments of the challenge.
6 responses to “Week 18 – Difficult”
-
The challenge expects a 2021 value of 116.23 for this prediction, which I am not seeing. I have tried filtering to 1972 onwards like in the suggested quickstart but then I get a value of 116.18
I have attempted several different filters for the input data and none reach the desired 116.23 so I think I must be missing a specific input.
After spending far too much time trying to figure out that value and failing, I thought I’d make up for it by converting the process into a UDF. So my solution includes a notebook to target a specific value (similar to the suggested quickstart) and a sql script that generates a UDTF to output a set number of predictions based on the input table.
If anybody can figure out where I’ve gone wrong to not get the 116.23 for 2021, please let me know!
- Solution URL – https://github.com/ChrisHastieIW/Frosty-Friday
-
I have slightly modified my answer after some investigation and believe the input data has changed. I have adjusted my input data to also remove 2021, so the years are 1972 <= year < 2021
The challenge expects a 2021 value of 116.23 for this prediction, whilst I now see a value of 116.22.
Comparing the results and values with the original Snowflake Quick Starts code
https://github.com/Snowflake-Labs/sfquickstarts/blob/master/site/sfguides/src/data_apps_summit_lab/assets/project_files/my_snowpark_pce.ipynb
it appears the original data itself has changed in very small volumes.
For example, the actual value for 2019 is now 109.933 when it used to be 109.922I believe this means my solution is correct and the input data itself has simply changed since the challenge was encoded.
- Solution URL – https://github.com/ChrisHastieIW/Frosty-Friday
-
I’ve spent too much time trying to make it work all contained as part of the dbt project…without success. Training the model locally within dbt seems…unattainable at the moment.
I’ve created a python notebook instead that creates the udf.
Similarly to @ChrisHastie above, my numbers do not quite match. But I did ran some exploration and decided to set an arbitrary cutoff to start in 1980, so there’s that…https://github.com/dsmdavid/frostyfridays-sf/blob/main/extra/ch18_create_udf_prediction.ipynb
-
I did it with SQL UDF.
- Solution URL – https://github.com/zlzlzl2-data/FrostyFriday/blob/main/FF18_LZ.sql
-
Unfortunately the data source for this challenge is no longer available in Snowflake Marketplace
-
The Knoema dataset is no longer available, so I used CyberSyn data instead which is “Financial & Economic Essentials”, and I’ve included the SQL to get similar data in the notebook.
- Solution URL – https://github.com/THiyama/TH-Frosty-Friday/blob/main/Frosty-Friday-Week18/notebook_app.ipynb
Leave a Reply
You must be logged in to post a comment.