To celebrate Women in Data and the upcoming Data+Women event, we’ve partnered up with Gaia Pometto , a data consultant in the Netherlands, to release a challenge for the Data+Women event. She’ll go through the challenge, live, at the event and you can find her on LinkedIn, Twitter and YouTube.
She also wanted to give a special mention to colleague Atzmon Ben Binyamin who collaborated with her on this challenge and was a great help throughout the process. You can find Atzmon on LinkedIn.
In order to prepare a future ingestion flow of data, your organization wants to get data from a weather API into Snowflake. At the end of the flow, it is expected to create a fact table to be used by consumers.
Your colleague Atzmon has already pulled a sample of the weather data from the API and momentarily stored it in an s3 bucket. The s3 bucket contains a JSON file with hourly data for 7 days.
Your Flow will consist of 3 layers (schemas) of data:
a. Landing zone (external stage, raw table – weather_raw)
b. Curated zone (raw data parsed table – weather_parsed)
c. Consumption zone (aggregated data table – weather_agg)
1. Create a stage for the landing layer and copy the JSON.
2. Parse the JSON and create a table – weather_parsed (in the curated zone).
3. Create a table for consumption weather_agg (in the consumption zone).
The table should have aggregates per day for:
b. Wind speed
c. Distinct ‘icon’ definitions for a day (e.g. [“cloudy”, “rain”])
Here is the link to the s3 bucket (hint: bucket = frostyfridaychallenges, path = challenge_25).
The solution should look something like this :
Remember, if you want to participate: