DEV Community

Tim
Tim

Posted on • Edited on

T-SQL: How To Use ROW NUMBER() and Why

No matter what data we receive, we will probably be required to order it a certain way at some point. In some cases, we may simply order data by dates and receive it in that format, meaning we have little to do. But there are other times when we have to order those data sets for other reasons - possibly to identify duplicates, to order the dated data with other patterns ordered, etc. In the video, SQL Basics: How To Use ROW NUMBER() and Why, we look at a built in function in most SQL languages that allows us to quickly order data by a column or more columns.

Some questions that are answered in the video:

  • In the examples, we're using new tables that have data in them to demonstrate the ROW_NUMBER() functionality along with using some tables that we've used in other examples.
  • When we review the examples, what happens if we have multiple columns that we order by? How would we specify that if we wanted Column1 to come before Column2?
  • What can we pass into the ROW_NUMBER() functionality that allows us to invert the order of a data set? How could this be useful?
  • Think of a math question that ROW_NUMBER() would help us solve. Any question that involves first ordering data (ie: median, mode as simple examples) will highlight where this functionality will be useful.

As mentioned in the video, we can sometimes use ordering data to calculate values (ie: median was mentioned as one outcome that we can get with using this functionality). In the case of median, as an example, if we order data ascending and descending by row number with a data set that has an odd number of values, when the values equal each other, we've found the median (note that this is not the case with an even number set of data points). This highlights an example of how we can find the median with an odd set of values using the ROW_NUMBER() functionality. We can also use this functionality (along with other functionality) to identify duplicates because a duplicate is ultimately an "identical" value to something else - thus two rows that are identical are really the same order, even if the ROW_NUMBER() functionality will list them as two (this is why other functionality such as PARTITION BY are key, as we want to then divide by identical values to find them).

ROW_NUMBER() starts at 1, if you're familiar with arrays and the starting with 0 functionality. Like arrays, we can get the Nth element of data from rows with this (one of many ways to get this when needed). Because we can use the functionality over many columns together or independently over many columns, this allows us to also get the Nth element of data by column combinations.

This functionality is available in many other SQL languages, as well as Spark functionality (Scala, Python, C#). For more SQL lessons, you can continue to watch the SQL Basics series.

Top comments (0)