Frosty Friday Consultants has been hired by the University of Frost’s history department; they want data on monarchs in their data warehouse for analysis. Your job is to take the JSON file located here, ingest it into the data warehouse, and parse it into a table that looks like this:
If you can’t read the above right-click and view image in another tab.
- Separate columns for nicknames and consorts 1 – 3, many will be null.
- An ID in chronological order (birth).
- An Inter-House ID in order as they appear in the file.
- There should be 26 rows at the end.
- Make sure you don’t lose any rows along the way.
- Be sure to investigate all the outputs and parameters available when transforming JSON.
Remember if you want to participate:
- Sign up as a member of Frosty Friday. You can do this by clicking on the sidebar, and then going to ‘REGISTER‘ (note joining our mailing list does not give you a Frosty Friday account)
- Post your code to GitHub and make it publicly available (Check out our guide if you don’t know how to here)
- Post the URL in the comments of the challenge.
If you have any technical questions you’d like to pose to the community, you can ask here on our dedicated thread.
Not quite sure how the ordering is meant to take place for the ID since my result seems to swap the records for Felipe V’s two reigns (same date of birth), but I’m happy with this overall.
Solution URL: https://github.com/ChrisHastieIW/Frosty-Friday/blob/main/Week%204%20-%20Hard%20-%20JSON%20Parsing/Week%204%20-%20Hard%20-%20JSON%20Parsing.sql
Reorganised my submission repo, new master solution URL here.
Hi Chris, Hi Dan,
Could you please help me to understand how we are populating INTER_HOUSE_ID. You have used index to generate the value. I am not clear on using “index”. Please help me to understand this.
@sachin: When you do a lateral flatten on the JSON, this creates output columns that contain the Index column along, defining the position in the JSON, along with the value.
You can see it by doing the following:
LATERAL FLATTEN(SRC:”Houses”) as H,
LATERAL FLATTEN(H.value:”Monarchs”) as M
Alex beat me to it!
In short Sachin, the INDEX field is automatically generated when any FLATTEN is used. The values of INDEX are simply the row number of the flattened result from the perspective of the flatten, starting from zero. For a list, this will be returning the index of the record within the list, which equates to a row number for all records retrieved from that particular list.
In SQL, you could think of this similarly to a ROW_NUMBER() window function which is partitioned by H.value:”Monarchs” and ordered in the same order as the values within H.value:”Monarchs”. The flatten is a little bit more complicated and robust than this so that it can avoid duplicates etc, but this explains the basic functionality
Great challenge to keep the semi-structured skills sharp!
Solution Week 4: https://github.com/AlexClaydon42/frostyfriday/blob/main/chalange_4.sql
Funny that the Catholic Monarchs didn’t have a spouse*…
– Use the macro from week 1 to create a new stage.
– Retrieve the file as json, then flatten on house and flatten on monarch.
I tried to be creative to extract the distinct keys inside the ‘Monarchs’ json for the different rows.
It was a learning experience using object_keys, array_agg, and array_to_string but, eventually, a rabbit hole (I obtained the unique keys and eventually used it to generate the SQL I ended up typing, still…)
so I ended up with something similar as others have posted, although I did not honor the column order…
* When there’s only one spouse/nickname, the object is a string, not an array, and thus doing something like `src:key` will return null (https://github.com/dsmdavid/frostyfridays-sf/blob/a2425bb65d7e609e3951ebe9bef76afc6545c92e/models/challenge_04_03.sql#L20 , the `coalesce` can help).
Or the dbt-docs: http://dvd-il-bucket-public.s3-website-eu-west-1.amazonaws.com/#!/model/model.frosty.challenge_04_03#code
I broke my parsing into steps, easier to track of things.
Also broke into steps, and coalesced the nicknames and spouses that got kicked out with 
This was a tricky one to get right. The sample solution of the image is incorrect as the multiple nicknames and consorts are wrong and it only picks up values where the value in the JSON is an array. I got around this by using a case statement on the first value with the IS_ARRAY function.
Please check out this solution blog: https://theinformationlab.nl/en/2022/09/06/snowflake-skills-4-parsing-a-json/
Pekka Kanerva says
Nice practice for wrangling json data!
I think having value vs array of values were key to understand. But overall really exercise
Cesare Simonini says
Here’s my solution 🙂
Jamie Laird says
Here’s my solution
My solution ^. Flattening json takes a while to get the hang of.
Great task to improve json skills
Loving the challenges so far