Snowpark Pandas! Finally, our love-hate relationship with Pandas becomes one completely full of love as we use familiar, well-known syntax on the distributed Snowflake system.
This week’s challenge tasks you with using Snowpark Pandas to answer 5 rapid-fire questions!
Below is your start-up SQL code:
create or replace stage frosty_stage url = 's3://frostyfridaychallenges/challenge_102/';
Then open up a notebook and place the following in the first cell:
import modin.pandas as pd
import snowflake.snowpark.modin.plugin
from snowflake.snowpark.context import get_active_session
session = get_active_session()
clothes_shop_df = pd.read_csv('@frosty_stage/clothes_shop_purchases.csv')
After that, you’re ready to answer the following 4 questions!
- At what hour of the day are the majority of our sales?
- Which server sold the most?
- What is the total price like if we deduct 20% for tax on the first five items?
- What would the biggest till number be if we merged tills 4 and 5?
Answers
1:
2:
3:
4:
sakatoku says
I had the wrong answer to question 1, and couldn’t find the data for Till 5… but I took the challenge anyway!
Through this challenge I got to know the limitations of Snowpark Pandas API and see its potential.
質問1の回答が違ったり、レジ5のデータが見つからなかったりしたけど…ともかくチャレンジしてみました!
このチャレンジを通じてSnowpark Pandas APIの制約を知ると同時にその可能性を感じることができました。
toruhiyama says
I solved it!
However, the answer to Problem1 came at 12 o’clock. Also, in Problem 4, there was no Till5, so I tried to think of Till4 as Till3.
I was able to understand how Snowpark Pandas can be used with the same API as regular Pandas.