You may think SQL window functions are similar to aggregating but it actually defines a window of rows with a given length around the current row.

You can do,

- aggregate operations: SUM, COUNT, MAX, MIN
- ranking: RANK, DENSE_RANK, ROW_NUMBER, NTILE
- value: LAG, LEAD, FIRST_VALUE, LAST_VALUE

Let's give a concrete example to understand what exactly a window function is doing.

Consider you are a data scientist working on a classification task and you want to predict if an event will occur within the next three hours. In your data, the label is 1 if the event occurred at that exact time and 0 otherwise. Therefore you need a backfill operation to build the target. You can do it by using SQL window functions. Let's create the data frame first,

*We don't need to create a database, we will use the pandasql library for executing SQL queries over pandas data frames (download via pip install pandas and pip install pandasql).*

```
# import libraries
import pandas as pd
data = [
("2020-01-01 06:00:00", 0),
("2020-01-01 07:00:00", 0),
("2020-01-01 08:00:00", 0),
("2020-01-01 09:00:00", 0),
("2020-01-01 10:00:00", 0),
("2020-01-01 11:00:00", 1),
("2020-01-01 12:00:00", 0),
("2020-01-01 13:00:00", 0),
("2020-01-01 14:00:00", 0),
("2020-01-01 15:00:00", 0),
]
# create data frame
df = pd.DataFrame(data, columns=["datetime", "event"])
df.head(10)
```

Here is the dataframe:

Let's solve the problem:

```
from pandasql import sqldf
pysqldf = lambda q: sqldf(q, globals())
solution = pysqldf(
"""
SELECT datetime, event, MAX(event) OVER
(
ORDER BY datetime ASC
ROWS BETWEEN CURRENT ROW AND 3 FOLLOWING
) as target
FROM df
"""
)
solution.head(10)
```

We ordered the rows using `datetime`

column in ascending order and filled the window rows with the MAX value. This is the data frame we get with the target column:

We labeled the previous 3 rows before 11am in the `target`

column.

## Discussion (0)