Week 15 – Intermediate

This week we’re working with small dataset about house sales that needs to be categorized according to certain sliding bins. The catch however , is that the sizes and the number of bins can change quickly.

The challenge is to create a function with a single name that does the following:

  • can handle uneven bin sizes
  • the first parameter must be the column that will inform your bins (in this example, we categorise according to [price])
  • the second parameter should specify the ranges of your bins (remember, these are uneven bins, bin 1 could be 1 – 400, and bin 2 401 – 708, while bin 3 is 709 – 3000) how you do this is up to you: you can specify lower bounds, upper bounds, both, count within each bin….
  • if using SQL, as a minimum, it should be able to handle 2-6 bins, if using other languages then you will find them flexible enough to allow you to do any number of bins

The query should look like the following :

SELECT sale_date,
       price,
       your_function(price,<bin_ranges>) AS BUCKET_SET1,
       your_function(price,<bin_ranges>) AS BUCKET_SET2,
       your_function(price,<bin_ranges>) AS BUCKET_SET3,
FROM home_sales

You will then need to pass in the following bin/bucket ranges for testing purposes:

  • Bucket_Set1:
    • 1: 0 – 1
    • 2: 2 – 310,000
    • 3: 310001 – 400000
    • 4: 400001 – 500000
  • Bucket_Set2:
    • 1: 0 – 210000
    • 2: 210001 – 350000
  • Bucket_Set3:
    • 1: 0 – 250000
    • 2: 250001 – 290001
    • 3: 290002 – 320000
    • 4: 320001 – 360000
    • 5: 360001 – 410000
    • 6: 410001 – 470001
Click image to expand

create table home_sales (
sale_date date,
price number(11, 2)
);

insert into home_sales (sale_date, price) values
(‘2013-08-01’::date, 290000.00),
(‘2014-02-01’::date, 320000.00),
(‘2015-04-01’::date, 399999.99),
(‘2016-04-01’::date, 400000.00),
(‘2017-04-01’::date, 470000.00),
(‘2018-04-01’::date, 510000.00);

Given the buckets above, the following results should be generated:

Click image to expand

Remember if you want to participate:

  1. Sign up as a member of Frosty Friday. You can do this by clicking on the sidebar, and then going to ‘REGISTER‘ (note joining our mailing list does not give you a Frosty Friday account)
  2. Post your code to GitHub and make it publicly available (Check out our guide if you don’t know how to here)
  3. Post the URL in the comments of the challenge.

You can also sign up to our mailing list below.

6 responses to “Week 15 – Intermediate”

  1. ChrisHastie avatar
    ChrisHastie

    Fun chance to demonstrate the advantages of Python-based UDFs

    • Solution URL – https://github.com/ChrisHastieIW/Frosty-Friday
  2. mat avatar
    mat

    I used sql udf

    • Solution URL – https://github.com/mateusz-kmon/frostyfridaychallenges/blob/main/w15.sql
  3. dsmdavid avatar
    dsmdavid

    My approach
    – Create a base table to insert the values.
    – Create a python udf (https://github.com/dsmdavid/frostyfridays-sf/blob/main/macros/ch_15_create_udf.sql) as a post_hook for the first table.
    – Create a second view with buckets (https://github.com/dsmdavid/frostyfridays-sf/blob/main/models/challenge_15_02.sql)

    I also added some additional prices below the minimal threshold for edge cases (and assign a bin value of 0)

    ![table summary](https://raw.githubusercontent.com/dsmdavid/frostyfridays-sf/main/assets/ch_15.png)

    • scallybrian avatar
      scallybrian

      Another python solution!

      • Solution URL – https://github.com/scallybrian/bs-frosty-sf/blob/main/frostyfridays/models/challenge_015/challenge_015
    • Pekka Kanerva avatar
      Pekka Kanerva

      Here’s my sql-based udf.

      • Solution URL – https://github.com/pekka-kanerva/frosty-friday/blob/main/week15/ff-week15-solution.sql
    • Chris B avatar
      Chris B

      Also did mine in a python UDF. Although I set mine up to output null as the bucket if the price fell outside the range of all buckets.

      • Solution URL – https://github.com/ChrisBo94/FrostyFriday/blob/main/Week_15.sql

    Leave a Reply