Leveraging FIRST_VALUE for time-based window aggregations and sliding windows in SQL

In SQL, window functions are a powerful tool for performing calculations over a specific subset of rows, known as a window. These functions allow us to compute aggregate values, rank rows, and more, all within the context of a defined window.

One common use case for window functions is performing time-based window aggregations and working with sliding windows. A sliding window allows us to compute aggregate values over a moving subset of rows, based on a time interval or a fixed number of preceding or following rows.

To achieve this, we can leverage the FIRST_VALUE window function, which returns the value of an expression from the first row of the window within each partition.

Let’s take a look at an example to understand how to use FIRST_VALUE for time-based window aggregations and sliding windows.

Example Scenario

Assume we have a table sales with the following structure:

sales
+------------+-------+
|  timestamp | amount|
+------------+-------+
|2022-01-01  | 100   |
|2022-01-02  | 150   |
|2022-01-03  | 200   |
|2022-01-04  | 120   |
|2022-01-05  | 180   |
+------------+-------+

Our goal is to calculate a running total of sales for a sliding window of the past three days. We want to include the current day in our calculation and sum the amount for the current day and the two preceding days.

Using FIRST_VALUE for Sliding Windows

To achieve the desired sliding window aggregation, we can use the FIRST_VALUE function combined with the ROWS BETWEEN ... PRECEDING clause.

Here’s the SQL query that calculates the running total for the sliding window:

SELECT
  timestamp,
  amount,
  SUM(amount) OVER (
    ORDER BY timestamp
    ROWS BETWEEN 2 PRECEDING AND CURRENT ROW
  ) as running_total
FROM sales;

The ROWS BETWEEN 2 PRECEDING AND CURRENT ROW clause specifies the range for our sliding window. It includes the current row and the two preceding rows based on the order of the timestamp column.

Results

Running the above query will give us the following result:

+------------+-------+--------------+
| timestamp  | amount| running_total|
+------------+-------+--------------+
|2022-01-01  | 100   | 100          |
|2022-01-02  | 150   | 250          |
|2022-01-03  | 200   | 450          |
|2022-01-04  | 120   | 470          |
|2022-01-05  | 180   | 500          |
+------------+-------+--------------+

The running_total column shows the sum of amount for the current day and the two preceding days, creating a sliding window of three days.

Conclusion

By leveraging the FIRST_VALUE function with the ROWS BETWEEN ... PRECEDING clause, we can easily perform time-based window aggregations and work with sliding windows in SQL. This enables us to calculate running totals, moving averages, and other calculations that require aggregating data over a specified time frame.

Using window functions like FIRST_VALUE can significantly simplify complex queries and provide efficient and concise solutions to time-based data analysis tasks.

Give it a try and start leveraging the power of SQL window functions in your next data analysis project!

References

  1. SQL Window Functions - FIRST_VALUE
  2. SQL Window Functions - ROWS Clause