Frosty Consulting has a client who wants to be able to load data from a stage in a manual but dynamic fashion. To be more specific they want to be able to:
- execute a single command (stored procedure)
- do so manually, meaning it won’t be scheduled and there won’t be any Snowpipes
- dynamically determine the warehouse size, if a file is over 10KB they want to use a small warehouse, anything under that size should be handled by an xsmall warehouse.
Click to see the skeleton script
-- Create the warehouses create warehouse if not exists my_xsmall_wh with warehouse_size = XSMALL auto_suspend = 120; create warehouse if not exists my_small_wh with warehouse_size = SMALL auto_suspend = 120; -- Create the table create or replace table <table_name> ( date_time datetime, trans_amount double ); -- Create the stage create or replace stage week_10_frosty_stage url = 's3://frostyfridaychallenges/challenge_10/' file_format = <enter_file_format>; -- Create the stored procedure create or replace procedure dynamic_warehouse_data_load(stage_name string, table_name string) <code goes here> -- Call the stored procedure. call dynamic_warehouse_data_load('week_10_frosty_friday_stage', '<table_name>');
Use the above skeleton script, and add the stored procedure.
When you execute the last line of the above script “call dynamic_warehouse_data_load()” then you should get the following result.
Remember if you want to participate:
- Sign up as a member of Frosty Friday. You can do this by clicking on the sidebar, and then going to ‘REGISTER‘ (note joining our mailing list does not give you a Frosty Friday account)
- Post your code to GitHub and make it publicly available (Check out our guide if you don’t know how to here)
- Post the URL in the comments of the challenge.
Solution URL – https://github.com/mateusz-kmon/frostyfridaychallenges/blob/main/w10.sql
This was a frustrating challenge as I discovered a limitation with Python Stored Procedures. The requirement was to change warehouses based on the file size, and Python Stored Procedures currently do not support changing warehouse or USE WAREHOUSE commands.
I only discovered this limitation when I had nearly finished the challenge. Instead of rewriting my solution in another language, I simply tweaked the challenge to resize the warehouse instead.
On the plus side, this challenge timed very well for me as I’d just published my blog on Python Stored Procedures in the Snowflake UI, so I was already familiar with the topic!
Link to blog for interested parties:
You may find this useful. It goes a bit beyond this requirement as it copies data from one db table (atm, oracle, other dbs can be included) to snowflake. steps are:- connections, create table in snowflake, export data to files of 250Mb zipped, load to internal stage, copy from stage to snowflake, all in a single command. The dynamic sizing of VWH is not present but would be simple to implement. https://snowflakecomputing.atlassian.net/wiki/spaces/~5da9bcbde954f00c2353dc5f/pages/2496922407/OracleToSnowflakeCopy
You solution seems quite interesting and I wanted to view the steps performed. When I tried to access the solution it is reporting me error.
Could you please help me to access the same.
Mail id : email@example.com
Please let me know your github login and i can grant you access.
I seem to have gone down quite a different route to ChrisHastie has gone, with mine being a lot more basic in terms of functionality. I didn’t come across the limitation of not being able to USE WAREHOUSE in a python stored procedure. I found I just needed to set the collect to a variable, even thought I wasn’t going to use it.
Good one to get started with, but mine can definitely be improved functionality wise.
Pekka Kanerva says
I did this with the long-waited sql-based stored procedures, nice challenge to learn how to interact with files in stage from the stored procedure.
Using a directory table and creating an array list of files to run Copy once for all files for each warehouse.
Also using LOAD_HISTORY to get number of records captured.
Using timestamp to get the row count.
well, it works, but kind of messy
Challenge 10 Blog: https://theinformationlab.nl/en/2022/11/15/snowflake-stored-procedures/
The hardest one I’ve ever done, copied structure form Atzmonky’s work. Definitely need more practice.