Window functions are a powerful feature of SQL, perfect for tasks ranging from calculating rolling averages to identifying duplicate records. They can rank, aggregate, and filter data at all levels of a table while preserving its base granularity. For aspiring data analytics professionals, mastering window functions is a surefire way to stand out. It is with the power of window functions that we’ve been able to tackle several reporting requests and data nuances for our clients; a few of which I’ll walk us through below in this clothing apparel distribution example.
Simple Deduping
A common scenario for a clothing apparel distribution company involves pulling descriptive fields from a table `product_inventory`, which contains multiple records for a single product. Each product can exist in multiple batches, representing different times of production or shipment. Descriptive fields like region, distributor, state, and city remain consistent across batches, which are the fields we want in our final table. However, a simple LEFT JOIN to the `product_inventory` table would create duplicate records.
Example Table:
REGION | DISTRIBUTOR | STATE | CITY | PRODUCT_NAME | BATCH_NBR | UPDATED_TS |
East | Ampersand | Florida | Miami | Polo_Shirts | 123456789 | 2024-02-01 05:01:02 |
East | Ampersand | Florida | Miami | Polo_Shirts | 123456790 | 2024-06-01 05:02:01 |
To solve this, we want a version of `product_inventory` that only contains the latest batch of each product, determined by the `updated_ts` timestamp.
SELECT * FROM product_inventory
QUALIFY ROW_NUMBER() OVER (PARTITION BY product_name ORDER BY updated_ts DESC) = 1
Result:
REGION | DISTRIBUTOR | STATE | CITY | PRODUCT_NAME | BATCH_NBR | UPDATED_TS |
East | Ampersand | Florida | Miami | Polo_Shirts | 123456790 | 024-06-01 05:02:01 |
Rolling 3-Week Average Sales
For the sales data at the distributor level, we need to calculate the average number of sales over the past three weeks, inclusive of the current week. This requires a rolling 3-week average that shifts forward by one week as time progresses. The challenge is that a distributor might not have sales every week, which must be factored into the average calculation.
First, we create two Common Table Expressions (CTEs): one for the list of Mondays (as our report weeks start on Monday) and one for the distinct list of distributors with sales.
WITH mondays AS (
SELECT report_date
FROM date_range
WHERE day_of_week = 1
),
distinct_distributors AS (
SELECT DISTINCT distributor_name
FROM sales
)
Next, we calculate the rolling average, accounting for weeks with zero sales.
, rolling_avg_sales AS (
SELECT
dr.report_date AS report_week,
dd.distributor_name,
AVG(COALESCE(st.gross_sales, 0)) OVER (
PARTITION BY dd.distributor_name
ORDER BY dr.report_date
ROWS BETWEEN 3 PRECEDING AND CURRENT ROW
) AS avg_gross_sales
FROM distinct_distributors dd
CROSS JOIN mondays dr
LEFT JOIN sales_table st
ON dr.report_date = st.report_week
AND dd.distributor_name = st.distributor_name
)
Weeks Recommended
We want to know how many consecutive weeks a product has been recommended for promotion, which can help prioritize marketing efforts. A product might fall in and out of recommendation, so we need to track consecutive recommendation periods.
We start by determining if a product was recommended in the previous week by looking at our table exclusively consisting of product promotion recommendations:
, window_weeks AS (
SELECT
product_name,
report_week,
LAG(report_week) OVER (
PARTITION BY product_name ORDER BY report_week
) AS previous_report_week,
CASE WHEN report_week != DATEADD(‘day’, 7, previous_report_week) THEN 1 END AS gap_flag
FROM recommendation_list
)
product_name | report_week | previous_report_week | gap_flag |
polo_shirts | 2024-04-08 | 1 | |
polo_shirts | 2024-04-15 | 2024-04-08 | 0 |
polo_shirts | 2024-04-22 | 2024-04-15 | 0 |
polo_shirts | 2024-05-06 | 2024-04-22 | 1 |
Next, create a rolling count of gaps to bucket report weeks that occur consecutively:
, week_grouping AS (
SELECT
product_name,
report_week,
COUNT(gap_flag) OVER (PARTITION BY product_name ORDER BY report_week) AS grp
FROM window_weeks
)
product_name | report_week | grp |
polo_shirts | 2024-04-08 | 1 |
polo_shirts | 2024-04-15 | 1 |
polo_shirts | 2024-04-22 | 1 |
polo_shirts | 2024-05-06 | 2 |
Finally, we assign row numbers to determine how many weeks a product has been recommended consecutively:
, weeks_recommended AS (
SELECT
product_name,
report_week,
ROW_NUMBER() OVER (PARTITION BY product_name, grp ORDER BY report_week) AS weeks_recommended
FROM week_grouping
)
product_name | report_week | weeks_smart_sale |
polo_shirts | 2024-04-08 | 1 |
polo_shirts | 2024-04-15 | 2 |
polo_shirts | 2024-04-22 | 3 |
polo_shirts | 2024-05-06 | 1 |
Final Takeaways
At Ampersand Consulting, we specialize in providing advanced data analytics solutions tailored to the specific needs of our clients. This blog post explores the powerful features of SQL window functions, showcasing their versatility in solving complex data challenges within the clothing apparel distribution industry.
From eliminating duplicate records to calculating rolling averages and tracking product recommendations, our expertise in window functions enables us to deliver precise and actionable insights. By partnering with Ampersand Consulting, clients gain access to cutting-edge analytical techniques that drive efficiency, enhance decision-making, and foster growth.
Discover how our services can transform your data into a strategic asset, empowering your business to thrive in a competitive market.
With over five years of consulting experience in the analytics space, Brody has extensive expertise in data transformation, ETL, pipeline design, analysis, dashboard development, and project management. Familiar with a Microsoft stack and preferring Power BI and SSMS/Azure, he also has experience with Snowflake and Tableau/Qlik Sense, holding certifications in Alteryx, Tableau, and Power BI.