Using FIRST_VALUE to find the first occurrence of a ZIP code in a dataset

20 Oct 2023

When working with datasets that contain multiple records with the same ZIP code, you may need to find the first occurrence of each unique ZIP code. In SQL, you can achieve this using the FIRST_VALUE function.

What is FIRST_VALUE?

The FIRST_VALUE function is a window function in SQL that allows you to retrieve the first value from an ordered set of rows within a defined window frame. It is commonly used to find the first occurrence of a specific value within a dataset.

Syntax of FIRST_VALUE

The syntax for FIRST_VALUE is as follows:

FIRST_VALUE (expression) OVER (
    [PARTITION BY partition_expression] 
    ORDER BY sort_expression [ASC | DESC] 
    ROWS {UNBOUNDED PRECEDING | value PRECEDING})

expression: The value you want to retrieve the first occurrence of.
PARTITION BY partition_expression: Optional. Used to divide the dataset into partitions before applying the FIRST_VALUE function.
ORDER BY sort_expression: Specifies the order in which the dataset should be sorted.
ROWS {UNBOUNDED PRECEDING | value PRECEDING}: Specifies the window frame within which the FIRST_VALUE function operates.

Example Usage

Let’s consider a dataset with two columns - zipcode and population. We want to find the first occurrence of each unique ZIP code along with its corresponding population.

Here’s an example SQL query that uses FIRST_VALUE to achieve this:

SELECT DISTINCT 
    FIRST_VALUE(zipcode) OVER (PARTITION BY zipcode ORDER BY population ASC) AS first_zipcode,
    FIRST_VALUE(population) OVER (PARTITION BY zipcode ORDER BY population ASC) AS first_population
FROM 
    dataset_table

In this query, we use FIRST_VALUE to retrieve the first occurrence of zipcode and population within each partition defined by zipcode. We order the dataset by population in ascending order to ensure we get the earliest occurrence.

The DISTINCT keyword ensures that only the unique combinations of ZIP code and population are returned.

Conclusion

Using the FIRST_VALUE function in SQL allows you to easily find the first occurrence of a ZIP code within a dataset. By partitioning and ordering the dataset appropriately, you can retrieve the earliest occurrence of each unique ZIP code along with its corresponding values.