This week we’re looking at something quirky but useful: date dimension together with a UDF that calculates the number of business days between 2 dates (and because it’s an easy challenge, we’re not excluding holidays).
We’ve talked to the Superhero Union and have received complaints that the schedules for any given hero, doesn’t take into account a ‘normal working routine’ and that ‘even superheroes need weekends off’.
To start this venture, we need you to construct a date dimension table that contains the following :
- the date, starting from 1-1-2000
- The year in 4 numbers (for 1-1-2000 this would be 2000)
- The first 3 letters of the month (for 1-1-2000 this would be Jan)
- The name of the month (for 1-1-2000 this would be January)
- The number of the day within that month (for 1-1-2000 this would be 1)
- The number of the weekday (Mondays are 1, Tuesdays are 2, etc., OR any other order that makes sense to you )
- The number of the week in that year (for 1-1-2000 this would be 1)
- The number of the day in that year (for 1-1-2000 this would be 1)
Besides the date dimension you should also create a function to calculate the number of business days between 2 dates that can easily switch between including or excluding the 2nd date in the calculation (so From and including Monday, 2 November 2020 to, but not including Friday, 6 November 2020 is 4 days
but From and including Monday, 2 November 2020 to and including Friday, 6 November 2020 is 5 days)
THEN use this function on the following testing_data :
testing_data
create table testing_data (
id INT,
start_date DATE,
end_date DATE
);
insert into testing_data (id, start_date, end_date) values (1, '11/11/2020', '9/3/2022');
insert into testing_data (id, start_date, end_date) values (2, '12/8/2020', '1/19/2022');
insert into testing_data (id, start_date, end_date) values (3, '12/24/2020', '1/15/2022');
insert into testing_data (id, start_date, end_date) values (4, '12/5/2020', '3/3/2022');
insert into testing_data (id, start_date, end_date) values (5, '12/24/2020', '6/20/2022');
insert into testing_data (id, start_date, end_date) values (6, '12/24/2020', '5/19/2022');
insert into testing_data (id, start_date, end_date) values (7, '12/31/2020', '5/6/2022');
insert into testing_data (id, start_date, end_date) values (8, '12/4/2020', '9/16/2022');
insert into testing_data (id, start_date, end_date) values (9, '11/27/2020', '4/14/2022');
insert into testing_data (id, start_date, end_date) values (10, '11/20/2020', '1/18/2022');
insert into testing_data (id, start_date, end_date) values (11, '12/1/2020', '3/31/2022');
insert into testing_data (id, start_date, end_date) values (12, '11/30/2020', '7/5/2022');
insert into testing_data (id, start_date, end_date) values (13, '11/28/2020', '6/19/2022');
insert into testing_data (id, start_date, end_date) values (14, '12/21/2020', '9/7/2022');
insert into testing_data (id, start_date, end_date) values (15, '12/13/2020', '8/15/2022');
insert into testing_data (id, start_date, end_date) values (16, '11/4/2020', '3/22/2022');
insert into testing_data (id, start_date, end_date) values (17, '12/24/2020', '8/29/2022');
insert into testing_data (id, start_date, end_date) values (18, '11/29/2020', '10/13/2022');
insert into testing_data (id, start_date, end_date) values (19, '12/10/2020', '7/31/2022');
insert into testing_data (id, start_date, end_date) values (20, '11/1/2020', '10/23/2021');
Remember, if you want to participate:
This one confused me as I couldn’t figure out what the date scaffold table was actually used for. I still created it though with the code in my solution, which I stole from my recent blog post:
https://interworks.com/blog/2022/08/02/using-snowflakes-generator-function-to-create-date-and-time-scaffold-tables/
The UDF itself was pretty easy since Python’s Numpy package already has the required functionality
Used the date_spine helper from dbt_utils to create the backbone and added the extra columns. I included an ‘is_holiday’, set to false but that could be overridden with other info.
Then I used the table to compute the days removing sat/sun, holidays,…
UDF – https://github.com/dsmdavid/frostyfridays-sf/blob/main/macros/ch_19_create_udf.sql
Base calendar table https://github.com/dsmdavid/frostyfridays-sf/blob/main/models/challenge_19.sql
I used an SQL function to calculate the difference of working days. I believe that in this case it is necessary to use the dim date table. If there is a holiday flag in the table, it is possible to include it in the calculation.
My contribution
A great refresher for dates and scaffolding
Here’s my solution
Went for the short route for UDF with datediff function.
Learned a lot again, from the challenge, AND from solutions by others…