This week FrostyFarms are looking to create a set of chained tasks – two to be exact! The farms have plenty of cows who produce milk for us, and from there, some of that milk will be converted into skim/skimmed milk. We want our data to be edited to reflect the fact that the fat percentage of the milk will determine how the data should look.
Skim milk goes through the process of fat reduction in a centrifuge, therefore, whole milky rows won’t need columns relating to that process, but the skim milky rows will.
Create a parent and child task that will perform different actions on the different rows of data depending on the fat percentage of the milk.
Here you can find the skeleton script
-- Set the database and schema use database <insert_db_name_here> use schema <insert_schema_name_here> -- Create the stage that points at the data. create stage week_11_frosty_stage url = 's3://frostyfridaychallenges/challenge_11/' file_format = <insert_csv_file_format; -- Create the table as a CTAS statement. create or replace table frosty_friday.challenges.week11 as select m.$1 as milking_datetime, m.$2 as cow_number, m.$3 as fat_percentage, m.$4 as farm_code, m.$5 as centrifuge_start_time, m.$6 as centrifuge_end_time, m.$7 as centrifuge_kwph, m.$8 as centrifuge_electricity_used, m.$9 as centrifuge_processing_time, m.$10 as task_used from @week_11_frosty_stage (file_format => '<insert_csv_file_format>', pattern => '.*milk_data.*[.]csv') m; -- TASK 1: Remove all the centrifuge dates and centrifuge kwph and replace them with NULLs WHERE fat = 3. -- Add note to task_used. create or replace task whole_milk_updates schedule = '1400 minutes' as <insert_sql_here> -- TASK 2: Calculate centrifuge processing time (difference between start and end time) WHERE fat != 3. -- Add note to task_used. create or replace task skim_milk_updates after frosty_friday.challenges.whole_milk_updates as <insert_sql_here> -- Manually execute the task. execute task whole_milk_updates; -- Check that the data looks as it should. select * from week11; -- Check that the numbers are correct. select task_used, count(*) as row_count from week11 group by task_used;
When you run the first query, the 3% data should look like
- Sign up as a member of Frosty Friday. You can do this by clicking on the sidebar, and then going to ‘REGISTER‘ (note joining our mailing list does not give you a Frosty Friday account)
- Post your code to GitHub and make it publicly available (Check out our guide if you don’t know how to here)
- Post the URL in the comments of the challenge.
Psst… in the near-future we might be announcing something exciting for those at the top of the challenge leaderboard. 😉