I had an interesting conversation at work where someone questioned my use of DENSE_RANK and suggested I use LAST_VALUE. We had a good conversation about it, but ultimately agreed to disagree about the specific scenario. My code was accurate with no performance issues warranting a rewrite. My feeling was the person I was talking too was less familiar with DENSE_RANK and so favored LAST_VALUE. I get preference, I have my own, but I try to be aware of what’s preference vs. what’s technically superior.
I’m curious about people’s thoughts on when to use LAST_VALUE() vs MAX() KEEP(DENSE_RANK LAST ORDER BY ). To my mind, both solve a similar problem. I lean towards DENSE_RANK largely out of preference due to the syntax being shorter (literally just less characters to type), and I also like that since DENSE_RANK is an aggregate function it outputs distinct results without needing to use the DISTINCT keyword. I haven’t intentionally run tests comparing the two, but anecdotally I’ve never noticed a performance difference between the two when writing comparable queries.
Surprisingly I couldn’t really find any detailed articles or discussions online comparing these two functions and I’m curious what thoughts are out there in the wild.
First of all thanks for this list, I didn’t know about the KEEP functionality so read up on it a bit, I found the following interesting read: https://rwijk.blogspot.com/2012/09/keep-clause.html?m=1
I can’t really play around with it at the moment as it seems Oracle only (I think) and I’m not using that at the moment. Might also explain preference for last/first value of your colleague cause it seems more general available.
Question from my side, why is DENSE_RANK used in this case? Seems to me like RANK or ROW_NUMBER could perform better, since there is less to keep track off? Of course ROW_NUMBER could give different results if there are duplicates in the ORDER BY column, but in that case the MAX is also sort of an arbitrary choice.
Yeah, I guess it’s a less-used analytic function (function? I think that’s the right term here).
I believe this will only accept DENSE_RANK as the ranking part of this method, which makes sense cause you want a distinct list of ranks to identify the last one. You are right though that it’s possible for the record returned by the MAX() to be arbitrary if there are duplicate values in the ORDER BY field. Luckily you can usually handle for that by using multiple fields in the ORDER BY. The conditions in the query that the function operates on top of also play a roll. You can also include a PARTITION BY, but I’ve only needed to use that a time or two.
I think one reason I like this more is it’s more readable to me. For a lot of the other analytic functions you end up using partitions and additional keywords like “ROWS BETWEEN UNBOUNDED AND …etc.” in order to break out results the way you need. With this dense_rank method I typically accomplish that using regular 'ole WHERE conditions. Of course that’s gonna be pretty subjective.
Replying on in my phone now so limited in how detailed I can be. I can give better examples later if that doesn’t make sense.
Edit: I think it is an Oracle thing. Analytics are one of the areas I will unapologetically use DB-specific functions. Portability be damned! Generally they’ve been tuned very specifically to solve a problem with the specific engine in mind. Of course that won’t always be true, but that’s my general thought process.