Preceding and following, can you use them in one statement?

I was teaching a class today on more advanced SQL queries and we were discussing if you could use preceding and following in a single windowing function.

Windowing Functions

If you’re not familiar with windowing functions, they’re really cool. You can perform all kinds of trickery and magic with them. Let’s start with a simple example.
The dataset I’m using has flight data from the early 2000’s.

My first query is this:

SELECT DESTINATION_ID
	,sum(ARRIVAL_DELAY) AS [Total Delay on arrival]
FROM FACT.FLIGHTDATA FFD
INNER JOIN DIM.DATE D ON FFD.DATEID = D.DIMDATEID
WHERE D.Month = 1
	AND D.Year = 2003
	AND d.DayOfMonth = 1
GROUP BY DESTINATION_ID
ORDER BY DESTINATION_ID

This query returns the destination ID’s and the sum of the delays on january first 2003. Nothing special here.

Delayed or not

Running Sum

But what if we want the running sum of delays. Not just the total sum but the delays need to add up. For this we can use a windowing function.

SELECT DESTINATION_ID
	,ARRIVAL_DELAY
	,SUM(ARRIVAL_DELAY) 
        OVER (
		     PARTITION BY DESTINATION_ID 
             ORDER BY DESTINATION_ID 
             ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW
		) AS RUNNING_DELAY
FROM FACT.FLIGHTDATA FFD
INNER JOIN DIM.DATE D ON FFD.DATEID = D.DIMDATEID
WHERE D.Month = 1
	AND D.Year = 2003
	AND d.DayOfMonth = 1
ORDER BY DESTINATION_ID

This query will return the running delay on every flight of the day. The way it works is that SQL picks up the data for the first destination id, orders it by destination id (a useless order but it’s just a demo, leave me alone) and then starts adding up the delay as each row gets processed. When the first destination is done, it resets the sum and starts all over again.

Adding up each row
Count reset with new ID

Running Sum on previous and current row

So far so good. Another technique is to look back in the rows. You could show a values from records ‘behind’ of the current record, you can also do some calculations with them. In this first case, I’m going to calculate my running sum on the current and previous row.

SELECT DESTINATION_ID
	,ACTUAL_DEPARTURE_TIME
	,ARRIVAL_DELAY
	,SUM(ARRIVAL_DELAY) 
            OVER (
		            PARTITION BY ACTUAL_DEPARTURE_TIME 
                    ORDER BY DESTINATION_ID 
                    ROWS BETWEEN 1 PRECEDING AND CURRENT ROW
		) AS RUNNING_DELAY
FROM FACT.FLIGHTDATA FFD
INNER JOIN DIM.DATE D ON FFD.DATEID = D.DIMDATEID
WHERE D.Month = 1
	AND D.Year = 2003
	AND d.DayOfMonth = 1
ORDER BY DESTINATION_ID

The query above is very much like the previous query, the difference is that there’s no unbounded preceding (meaning read everything from the current data window), but look one row back.

Running sum over 2 rows

In this result set, you’ll notice the delays are quite steady. The delays continue to be calculated over the current row and the previous row. The keen eyed people will notice that the sorting being weird, for ID 6 and departure time 1318 the first row is at the bottom of the result set. My only explanation is that the rows are processed by the windowing function in a different order than my order by is. Something for another blog to dive deeper into.

Running sum with preceding and following

As I wrote earlier, you can go both ways, looking ahead and behind you. This query will look something like this

SELECT DESTINATION_ID
	,ACTUAL_DEPARTURE_TIME
	,ARRIVAL_DELAY
	,SUM(ARRIVAL_DELAY) 
           OVER (
		          PARTITION BY DESTINATION_ID, ACTUAL_DEPARTURE_TIME 
                  ORDER BY DESTINATION_ID 
                  ROWS BETWEEN 1 PRECEDING AND 2 FOLLOWING
		) AS RUNNING_DELAY
FROM FACT.FLIGHTDATA FFD
INNER JOIN DIM.DATE D ON FFD.DATEID = D.DIMDATEID
WHERE D.Month = 1
	AND D.Year = 2003
	AND d.DayOfMonth = 1
ORDER BY DESTINATION_ID, ACTUAL_DEPARTURE_TIME

This was the question from class, will it work? My answer, inspired by other great teachers, is to try it out for yourself. Learn by trying it out and explaining the results. I don’t know if they did it, but I found it fun enough to give it a shot as I wasn’t sure about the answer.

It works!

Now, I had to think a bit about the result set, because it looks quite weird.
On row one, it can’t look back, but it can look forward two rows with the current row included. -19 times 3 = -57.
The second row can look one row back, and the result is -19 times 4 = -76.
The third row is equally calculated.
Row 4 can’t look enough any more because the actual departure time rows are running out for 950. So the sum drops back to -57.
Row 5 drops even back further because there are no following rows any more, just the current row and the preceding row.

Concluding

This means you can look both ways but remember that the current row always comes along. So when you’re looking back one row, it’s the previous row and the current row. If you need to calculate a running sum over 6 rows, make sure your preceding or following statement is 5 to take the current row into account.

This was an excellent question from my class and one of the reasons I just love teaching. Classes come up with the best questions that will trigger you to dig deeper into something, enabling you to learn more.

If you think my explanations are not clear (enough) or I’ve missed a point, please let me know!

Thanks for reading!

Een gedachte over “Preceding and following, can you use them in one statement?

Geef een reactie

Vul je gegevens in of klik op een icoon om in te loggen.

WordPress.com logo

Je reageert onder je WordPress.com account. Log uit /  Bijwerken )

Twitter-afbeelding

Je reageert onder je Twitter account. Log uit /  Bijwerken )

Facebook foto

Je reageert onder je Facebook account. Log uit /  Bijwerken )

Verbinden met %s