Week 15 – Intermediate

This week we’re working with small dataset about house sales that needs to be categorized according to certain sliding bins. The catch however , is that the sizes and the number of bins can change quickly.

The challenge is to create a function with a single name that does the following:

can handle uneven bin sizes
the first parameter must be the column that will inform your bins (in this example, we categorise according to [price])
the second parameter should specify the ranges of your bins (remember, these are uneven bins, bin 1 could be 1 – 400, and bin 2 401 – 708, while bin 3 is 709 – 3000) how you do this is up to you: you can specify lower bounds, upper bounds, both, count within each bin….
if using SQL, as a minimum, it should be able to handle 2-6 bins, if using other languages then you will find them flexible enough to allow you to do any number of bins

The query should look like the following :

SELECT sale_date,
       price,
       your_function(price,<bin_ranges>) AS BUCKET_SET1,
       your_function(price,<bin_ranges>) AS BUCKET_SET2,
       your_function(price,<bin_ranges>) AS BUCKET_SET3,
FROM home_sales

You will then need to pass in the following bin/bucket ranges for testing purposes:

Bucket_Set1:
- 1: 0 – 1
- 2: 2 – 310,000
- 3: 310001 – 400000
- 4: 400001 – 500000
Bucket_Set2:
- 1: 0 – 210000
- 2: 210001 – 350000
Bucket_Set3:
- 1: 0 – 250000
- 2: 250001 – 290001
- 3: 290002 – 320000
- 4: 320001 – 360000
- 5: 360001 – 410000
- 6: 410001 – 470001

You will find your setup code here

create table home_sales (
sale_date date,
price number(11, 2)
);

insert into home_sales (sale_date, price) values
(‘2013-08-01’::date, 290000.00),
(‘2014-02-01’::date, 320000.00),
(‘2015-04-01’::date, 399999.99),
(‘2016-04-01’::date, 400000.00),
(‘2017-04-01’::date, 470000.00),
(‘2018-04-01’::date, 510000.00);

Given the buckets above, the following results should be generated:

Remember if you want to participate:

Sign up as a member of Frosty Friday. You can do this by clicking on the sidebar, and then going to ‘REGISTER‘ (note joining our mailing list does not give you a Frosty Friday account)
Post your code to GitHub and make it publicly available (Check out our guide if you don’t know how to here)
Post the URL in the comments of the challenge.

You can also sign up to our mailing list below.

7 responses to “Week 15 – Intermediate”

ChrisHastie

2022-09-23
Fun chance to demonstrate the advantages of Python-based UDFs
- Solution URL – https://github.com/ChrisHastieIW/Frosty-Friday
Log in to Reply
mat

2022-09-23
I used sql udf
- Solution URL – https://github.com/mateusz-kmon/frostyfridaychallenges/blob/main/w15.sql
Log in to Reply
dsmdavid

2022-09-29
My approach
– Create a base table to insert the values.
– Create a python udf (https://github.com/dsmdavid/frostyfridays-sf/blob/main/macros/ch_15_create_udf.sql) as a post_hook for the first table.
– Create a second view with buckets (https://github.com/dsmdavid/frostyfridays-sf/blob/main/models/challenge_15_02.sql)

I also added some additional prices below the minimal threshold for edge cases (and assign a bin value of 0)

![table summary](https://raw.githubusercontent.com/dsmdavid/frostyfridays-sf/main/assets/ch_15.png)
Log in to Reply
scallybrian

2022-10-03
Another python solution!
- Solution URL – https://github.com/scallybrian/bs-frosty-sf/blob/main/frostyfridays/models/challenge_015/challenge_015
Log in to Reply
Pekka Kanerva

2022-10-18
Here’s my sql-based udf.
- Solution URL – https://github.com/pekka-kanerva/frosty-friday/blob/main/week15/ff-week15-solution.sql
Log in to Reply
Chris B

2023-11-14
Also did mine in a python UDF. Although I set mine up to output null as the bucket if the price fell outside the range of all buckets.
- Solution URL – https://github.com/ChrisBo94/FrostyFriday/blob/main/Week_15.sql
Log in to Reply
Daryl Kit

2024-02-20
🙂
- Solution URL – https://github.com/darylkit/Frosty_Friday/blob/main/Week%2015%20-%20UDFs/udf.sql
Log in to Reply

Comments

ChrisHastie says

2022-09-23 at 12:41
Fun chance to demonstrate the advantages of Python-based UDFs
- Solution URL - https://github.com/ChrisHastieIW/Frosty-Friday
Log in to Reply
mat says

2022-09-23 at 15:59
I used sql udf
- Solution URL - https://github.com/mateusz-kmon/frostyfridaychallenges/blob/main/w15.sql
Log in to Reply
dsmdavid says

2022-09-29 at 19:09
My approach
– Create a base table to insert the values.
– Create a python udf (https://github.com/dsmdavid/frostyfridays-sf/blob/main/macros/ch_15_create_udf.sql) as a post_hook for the first table.
– Create a second view with buckets (https://github.com/dsmdavid/frostyfridays-sf/blob/main/models/challenge_15_02.sql)

I also added some additional prices below the minimal threshold for edge cases (and assign a bin value of 0)

![table summary](https://raw.githubusercontent.com/dsmdavid/frostyfridays-sf/main/assets/ch_15.png)
Log in to Reply
scallybrian says

2022-10-03 at 15:29
Another python solution!
- Solution URL - https://github.com/scallybrian/bs-frosty-sf/blob/main/frostyfridays/models/challenge_015/challenge_015
Log in to Reply
Pekka Kanerva says

2022-10-18 at 08:55
Here’s my sql-based udf.
- Solution URL - https://github.com/pekka-kanerva/frosty-friday/blob/main/week15/ff-week15-solution.sql
Log in to Reply
Chris B says

2023-11-14 at 16:59
Also did mine in a python UDF. Although I set mine up to output null as the bucket if the price fell outside the range of all buckets.
- Solution URL - https://github.com/ChrisBo94/FrostyFriday/blob/main/Week_15.sql
Log in to Reply
Daryl Kit says

2024-02-20 at 08:53
🙂
- Solution URL - https://github.com/darylkit/Frosty_Friday/blob/main/Week%2015%20-%20UDFs/udf.sql
Log in to Reply

7 responses to “Week 15 – Intermediate”

Leave a Reply Cancel reply

Reader Interactions

Comments

Leave a Reply Cancel reply