Elevate Your SQL Skills with Window Functions: A Guide for Data Analysts

June 27, 2024 by Brody Balinski

Window functions are a powerful feature of SQL, perfect for tasks ranging from calculating rolling averages to identifying duplicate records. They can rank, aggregate, and filter data at all levels of a table while preserving its base granularity. For aspiring data analytics professionals, mastering window functions is a surefire way to stand out. It is with the power of window functions that we’ve been able to tackle several reporting requests and data nuances for our clients; a few of which I’ll walk us through below in this clothing apparel distribution example.

Simple Deduping

A common scenario for a clothing apparel distribution company involves pulling descriptive fields from a table `product_inventory`, which contains multiple records for a single product. Each product can exist in multiple batches, representing different times of production or shipment. Descriptive fields like region, distributor, state, and city remain consistent across batches, which are the fields we want in our final table. However, a simple LEFT JOIN to the `product_inventory` table would create duplicate records.

Example Table:

REGION	DISTRIBUTOR	STATE	CITY	PRODUCT_NAME	BATCH_NBR	UPDATED_TS
East	Ampersand	Florida	Miami	Polo_Shirts	123456789	2024-02-01 05:01:02
East	Ampersand	Florida	Miami	Polo_Shirts	123456790	2024-06-01 05:02:01

To solve this, we want a version of `product_inventory` that only contains the latest batch of each product, determined by the `updated_ts` timestamp.

SELECT * FROM product_inventory
QUALIFY ROW_NUMBER() OVER (PARTITION BY product_name ORDER BY updated_ts DESC) = 1

Result:

REGION	DISTRIBUTOR	STATE	CITY	PRODUCT_NAME	BATCH_NBR	UPDATED_TS
East	Ampersand	Florida	Miami	Polo_Shirts	123456790	024-06-01 05:02:01

Rolling 3-Week Average Sales

For the sales data at the distributor level, we need to calculate the average number of sales over the past three weeks, inclusive of the current week. This requires a rolling 3-week average that shifts forward by one week as time progresses. The challenge is that a distributor might not have sales every week, which must be factored into the average calculation.

First, we create two Common Table Expressions (CTEs): one for the list of Mondays (as our report weeks start on Monday) and one for the distinct list of distributors with sales.

WITH mondays AS (
SELECT report_date
FROM date_range
WHERE day_of_week = 1
),
distinct_distributors AS (
SELECT DISTINCT distributor_name
FROM sales
)

Next, we calculate the rolling average, accounting for weeks with zero sales.

, rolling_avg_sales AS (
SELECT
dr.report_date AS report_week,
dd.distributor_name,
AVG(COALESCE(st.gross_sales, 0)) OVER (
PARTITION BY dd.distributor_name
ORDER BY dr.report_date
ROWS BETWEEN 3 PRECEDING AND CURRENT ROW
) AS avg_gross_sales
FROM distinct_distributors dd
CROSS JOIN mondays dr
LEFT JOIN sales_table st
ON dr.report_date = st.report_week
AND dd.distributor_name = st.distributor_name
)

Weeks Recommended

We want to know how many consecutive weeks a product has been recommended for promotion, which can help prioritize marketing efforts. A product might fall in and out of recommendation, so we need to track consecutive recommendation periods.

We start by determining if a product was recommended in the previous week by looking at our table exclusively consisting of product promotion recommendations:

, window_weeks AS (
SELECT
product_name,
report_week,
LAG(report_week) OVER (
PARTITION BY product_name ORDER BY report_week
) AS previous_report_week,
CASE WHEN report_week != DATEADD(‘day’, 7, previous_report_week) THEN 1 END AS gap_flag
FROM recommendation_list
)

product_name	report_week	previous_report_week	gap_flag
polo_shirts	2024-04-08		1
polo_shirts	2024-04-15	2024-04-08	0
polo_shirts	2024-04-22	2024-04-15	0
polo_shirts	2024-05-06	2024-04-22	1

Next, create a rolling count of gaps to bucket report weeks that occur consecutively:

, week_grouping AS (
SELECT
product_name,
report_week,
COUNT(gap_flag) OVER (PARTITION BY product_name ORDER BY report_week) AS grp
FROM window_weeks
)

product_name	report_week	grp
polo_shirts	2024-04-08	1
polo_shirts	2024-04-15	1
polo_shirts	2024-04-22	1
polo_shirts	2024-05-06	2

Finally, we assign row numbers to determine how many weeks a product has been recommended consecutively:

, weeks_recommended AS (
SELECT
product_name,
report_week,
ROW_NUMBER() OVER (PARTITION BY product_name, grp ORDER BY report_week) AS weeks_recommended
FROM week_grouping
)

product_name	report_week	weeks_smart_sale
polo_shirts	2024-04-08	1
polo_shirts	2024-04-15	2
polo_shirts	2024-04-22	3
polo_shirts	2024-05-06	1

Final Takeaways

At Ampersand Consulting, we specialize in providing advanced data analytics solutions tailored to the specific needs of our clients. This blog post explores the powerful features of SQL window functions, showcasing their versatility in solving complex data challenges within the clothing apparel distribution industry.

From eliminating duplicate records to calculating rolling averages and tracking product recommendations, our expertise in window functions enables us to deliver precise and actionable insights. By partnering with Ampersand Consulting, clients gain access to cutting-edge analytical techniques that drive efficiency, enhance decision-making, and foster growth.

Discover how our services can transform your data into a strategic asset, empowering your business to thrive in a competitive market.

With over five years of consulting experience in the analytics space, Brody has extensive expertise in data transformation, ETL, pipeline design, analysis, dashboard development, and project management. Familiar with a Microsoft stack and preferring Power BI and SSMS/Azure, he also has experience with Snowflake and Tableau/Qlik Sense, holding certifications in Alteryx, Tableau, and Power BI.

"Window functions are my favorite feature of SQL due to their versatility and power. They allow us to perform complex calculations while maintaining the original table structure. By mastering window functions, analysts can significantly enhance their SQL skills and stand out in the field. We've harnessed this capability to tackle diverse reporting requests and address data intricacies for our clients, demonstrating their practical application in the examples shared here."

- Brody Balinski, Senior Consultant, Data & Analytics

Outdated Tech is Costing You More Than You Think: Why Agentic AI is the Smart Move for Real Estate

In an industry where timing, data, and precision are everything, outdated systems can quietly undermine efficiency and profitability. Many real estate companies rely on expensive third-party platforms not just for data visualization, but...

Expert Articles

Beyond Generic AI: The Strategic Case for Custom Conversational Solutions

In today’s AI-driven landscape, businesses face a critical decision: continue with off-the-shelf AI platforms or invest in custom solutions. Through our recent experience helping clients transition away from generic conversational AI platforms like...

Expert Articles, News

Unlocking Value Beyond Legacy Systems: Why AI-Driven Solutions Are the Future

In today’s fast-moving business landscape, holding on to legacy systems can be a costly burden. Many organizations unknowingly pay a premium for outdated third-party solutions that no longer serve their evolving needs. The...

Talk to Sandy