Home > Blog > Expert Articles > Elevate Your SQL Skills with Window Functions: A Guide for Data Analysts

Elevate Your SQL Skills with Window Functions: A Guide for Data Analysts

Window functions are a powerful feature of SQL, perfect for tasks ranging from calculating rolling averages to identifying duplicate records. They can rank, aggregate, and filter data at all levels of a table while preserving its base granularity. For aspiring data analytics professionals, mastering window functions is a surefire way to stand out. It is with the power of window functions that we’ve been able to tackle several reporting requests and data nuances for our clients; a few of which I’ll walk us through below in this clothing apparel distribution example.

Simple Deduping

A common scenario for a clothing apparel distribution company involves pulling descriptive fields from a table `product_inventory`, which contains multiple records for a single product. Each product can exist in multiple batches, representing different times of production or shipment. Descriptive fields like region, distributor, state, and city remain consistent across batches, which are the fields we want in our final table. However, a simple LEFT JOIN to the `product_inventory` table would create duplicate records.

Example Table:

REGIONDISTRIBUTORSTATECITYPRODUCT_NAMEBATCH_NBRUPDATED_TS
EastAmpersandFloridaMiamiPolo_Shirts1234567892024-02-01 05:01:02
EastAmpersandFloridaMiamiPolo_Shirts1234567902024-06-01 05:02:01

To solve this, we want a version of `product_inventory` that only contains the latest batch of each product, determined by the `updated_ts` timestamp.

SELECT * FROM product_inventory
QUALIFY ROW_NUMBER() OVER (PARTITION BY product_name ORDER BY updated_ts DESC) = 1

Result:

REGIONDISTRIBUTORSTATECITYPRODUCT_NAMEBATCH_NBRUPDATED_TS
EastAmpersandFloridaMiamiPolo_Shirts123456790024-06-01 05:02:01

Rolling 3-Week Average Sales

For the sales data at the distributor level, we need to calculate the average number of sales over the past three weeks, inclusive of the current week. This requires a rolling 3-week average that shifts forward by one week as time progresses. The challenge is that a distributor might not have sales every week, which must be factored into the average calculation.

First, we create two Common Table Expressions (CTEs): one for the list of Mondays (as our report weeks start on Monday) and one for the distinct list of distributors with sales.

WITH mondays AS (
SELECT report_date
FROM date_range
WHERE day_of_week = 1
),
distinct_distributors AS (
SELECT DISTINCT distributor_name
FROM sales
)

Next, we calculate the rolling average, accounting for weeks with zero sales.

, rolling_avg_sales AS (
SELECT
    dr.report_date AS report_week,
    dd.distributor_name,
    AVG(COALESCE(st.gross_sales, 0)) OVER (
        PARTITION BY dd.distributor_name
        ORDER BY dr.report_date
        ROWS BETWEEN 3 PRECEDING AND CURRENT ROW
    ) AS avg_gross_sales
FROM distinct_distributors dd
CROSS JOIN mondays dr
LEFT JOIN sales_table st
    ON dr.report_date = st.report_week
    AND dd.distributor_name = st.distributor_name
)

Weeks Recommended

We want to know how many consecutive weeks a product has been recommended for promotion, which can help prioritize marketing efforts. A product might fall in and out of recommendation, so we need to track consecutive recommendation periods.

We start by determining if a product was recommended in the previous week by looking at our table exclusively consisting of product promotion recommendations:

, window_weeks AS (
SELECT
    product_name,
    report_week,
    LAG(report_week) OVER (
        PARTITION BY product_name ORDER BY report_week
    ) AS previous_report_week,
    CASE WHEN report_week != DATEADD(‘day’, 7, previous_report_week) THEN 1 END AS gap_flag
FROM recommendation_list
)

product_namereport_weekprevious_report_weekgap_flag
polo_shirts2024-04-081
polo_shirts2024-04-152024-04-080
polo_shirts2024-04-222024-04-150
polo_shirts2024-05-062024-04-221

Next, create a rolling count of gaps to bucket report weeks that occur consecutively:

, week_grouping AS (
SELECT
    product_name,
    report_week,
    COUNT(gap_flag) OVER (PARTITION BY product_name ORDER BY report_week) AS grp
FROM window_weeks
)

product_namereport_weekgrp
polo_shirts2024-04-081
polo_shirts2024-04-151
polo_shirts2024-04-221
polo_shirts2024-05-062

Finally, we assign row numbers to determine how many weeks a product has been recommended consecutively:

, weeks_recommended AS (
SELECT
    product_name,
    report_week,
    ROW_NUMBER() OVER (PARTITION BY product_name, grp ORDER BY report_week) AS weeks_recommended
FROM week_grouping
)

product_namereport_weekweeks_smart_sale
polo_shirts2024-04-081
polo_shirts2024-04-152
polo_shirts2024-04-223
polo_shirts2024-05-061

Final Takeaways

At Ampersand Consulting, we specialize in providing advanced data analytics solutions tailored to the specific needs of our clients. This blog post explores the powerful features of SQL window functions, showcasing their versatility in solving complex data challenges within the clothing apparel distribution industry.

From eliminating duplicate records to calculating rolling averages and tracking product recommendations, our expertise in window functions enables us to deliver precise and actionable insights. By partnering with Ampersand Consulting, clients gain access to cutting-edge analytical techniques that drive efficiency, enhance decision-making, and foster growth.

Discover how our services can transform your data into a strategic asset, empowering your business to thrive in a competitive market.


Featured Articles

Overlay AI vs. Native Enterprise AI: Which Approach Will Transform Your Business?

Artificial Intelligence (AI) is no longer a futuristic concept; it’s a present reality reshaping how businesses operate. From automating mundane tasks to providing deep insights through data analytics, AI is a game-changer. But...

How AI Adoption is Evolving: The Consumers, Integrators, and Creators Framework

Artificial Intelligence (AI) is transforming the way organizations operate. In our recent episode of the Looking Forward podcast, we sat down with Doug Smith, Principal AI and Data Analytics Consultant at Ampersand Consulting,...

Knock-Knock: It’s Woods Theorem to Help With Generative AI Deployments

The world of artificial intelligence is both mesmerizing and full of hurdles. As we delve into its intricacies, understanding applicable theories is paramount. That said, do you want to hear one in joke...
Apply now