In Week 1 we looked at ingesting S3 data, now it’s time to take that a step further. So this week we’ve got a short list of tasks for you all to do.
The basics aren’t earth-shattering but might cause you to scratch your head a bit once you start building the solution.
Frosty Friday Inc., your benevolent employer, has an S3 bucket that was filled with .csv data dumps. These dumps aren’t very complicated and all have the same style and contents. All of these files should be placed into a single table.
However, it might occur that some important data is uploaded as well, these files have a different naming scheme and need to be tracked. We need to have the metadata stored for reference in a separate table. You can recognize these files because of a file inside of the S3 bucket. This file, keywords.csv, contains all of the keywords that mark a file as important.
Create a table that lists all the files in our stage that contain any of the keywords in the keywords.csv file.
The S3 bucket’s URI is: s3://frostyfridaychallenges/challenge_3/
Your result should look like:
Remember if you want to participate:
- Sign up as a member of Frosty Friday. You can do this by clicking on the sidebar, and then going to ‘REGISTER‘ (note joining our mailing list does not give you a Frosty Friday account)
- Post your code to GitHub and make it publicly available (Check out our guide if you don’t know how to here)
- Post the URL in the comments of the challenge.
If you have any technical questions you’d like to pose to the community, you can ask here on our dedicated thread.
I had time to fit in one more today. Fun little challenge
Solution URL: https://github.com/ChrisHastieIW/Frosty-Friday/blob/main/Week%203%20-%20Basic%20-%20Metadata%20Queries/Week%203%20-%20Basic%20-%20Metadata%20Queries.sql
Reorganised my submission repo, new master solution URL here.
Good Use case, Used Pattern while COPY and CONCAT while loading data to table
1. Use the macro from week 1 to create a new stage.
2. List the files in the stage.
3. Retrieve the last_query_id to scan the results.
4. Read the keywords’ file.
5. Keep those files in the stage that match any of the keywords (`ilike ‘%’ || kwd || ‘%’ )
Still sticking to dbt, https://github.com/dsmdavid/frostyfridays-sf/blob/main/models/challenge_03.sql
or the dbt-docs http://dvd-il-bucket-public.s3-website-eu-west-1.amazonaws.com/#!/model/model.frosty.challenge_03
Marian Eerens says
Notes & Documentation: https://www.craft.do/s/K03sY2JoJWQqAJ
DownToEarth DataTips says
Nice, gradual upgrade in difficulty from Week 1!
Here is my video solution: https://www.youtube.com/watch?v=vZGC1ZQzwoI
And here the solution on GitHub: [https://github.com/DownToEarthDataTips/FrostyFridays/blob/main/Week 3 – Load Metadata into a Table](https://github.com/DownToEarthDataTips/FrostyFridays/blob/main/Week%203%20%20-%20Load%20Metadata%20into%20a%20Table)
Continue to practicing with metadata, so it string match.
LIKE ANY was a new one for me..
Created the final table using 2 methods, where exists and like any
Check out this solution blog: https://theinformationlab.nl/2022/08/26/snowflake-skills-3-metadata/
Pekka Kanerva says
Catching up, 3/15…
A bit different. I didnt load in proper columns. And used load_history to get information on table loads.
Keep practicing with Snowflake even if my code seems very basic for this task
Really liked this one as an intro to metadata!
Solution URL updated
Jamie Laird says
Here’s my solution.
I found a neat workaround to COPY INTO not allowing complex queries by running my query first and then using `table(result_scan(last_query_id()))` within the copy statement. The only quirk was that in DataGrip you need to offset the query id by -5 for some reason.
LIKE ANY is a good syntax.
Not sure it was the most efficient way to do it but got to the answer…
Loving the challenges so far!