Some historical documents relating to the study of law

The other evening, my son Tony and I watched The Paper Chase, a melodramatic movie plotted over the course of a first year of law school at Harvard in the 1970s. Setting aside the childish outbursts in class by the protagonist and the pretentious enunciation of the contracts professor, Kingsfield, there were many elements familiar to my own first year of law school at a somewhat less prestigious university. Among them were the tensions caused by the fear of being called on randomly and without warming to ‘present’ a case, the study group and petty squabbling thereof, the grouping of students into those who choose to participate in classroom discussion and those that don’t, contract law, and not least among these, the 4 hour essay exam using Blue Books by which one’s grade for the entire year is determined.

I must confess that having watched the movie prior to attending law school, I copied the ritual of ditching the dorms and spending a weekend of intensive study in an off-campus motel. The movie also motivated me to fully study any readings assigned prior to the first day of class. In contrast to the movie, my own beloved Contracts professor, Bill Martin, was a gentle soul, not given to theatrics. In an official nod to the movie, during an orientation lecture one of the professors archly recited the ‘Skull full of mush’ lecture, much to our amusement.

So, I dug through my old files and scanned a few interesting tidbits. First, is my LSAT answer sheet and score. I don’t have the questions, regrettably.

LSAT Answer Sheet and Score

I thought it would be amusing to take a look at textbook price inflation since the fall of 1982. Since all first year students in a section (our class was divided into three sections each of which took the same classes the entire first year) had the same textbooks, the school was kind enough to present us with our foot and a half tall pile of books, pre-selected and stacked, and a bill for the princely sum of over $400, if I recall correctly. As you can see, I was in section 1. All of the classes were full year except for Criminal Law which was a semester long and was replaced by a semester of Property Law in the spring. One thing new to me in law school was the practice professors had of referring to the textbooks by the last name of the author. I think there were a couple of reasons for this. First, we might have multiple textbooks, and second, the professor might actually know or have met the authors of the book. So for instance in Torts class, the professor might say “Please turn your attention to the footnotes on page 127 of Prosser.” It also struck me as pretentious, so I promptly adopted this practice in my own conversation.

First Year Law Books

And since Tony is taking exams around this time of year, I included my exam schedule. While the only final grade at mid-term was Criminal Law, we worked pretty hard to master the other subjects as well. For the year long classes, the mid term weighed 1/3 of the grade or less, with there being some suspicion that some professors ignored mid-term results completely as one’s knowledge at end of course was the only thing of interest. Looking at this at the beginning of school, I thought December was going to be a breeze, with an apparently leisurely exam schedule. In fact, I prepared myself intensely and was able to place myself in an extremely focussed state for each exam. During the exam weeks, I thought longingly of how relieved I would feel after exams were over. But after the last exam, I was so fried, that I was incapable of maintaining any sort of feeling whatsoever. In retrospect a sub-optimal practice, but I must confess to loosening the bonds of mental discipline by imbibing alcoholic beverages for a short period of days, and in adopting this practice I was by no means unique among my class.

Exam Schedule, December 1982

Respectfully submitted, Mark C. Knutson

SQL Server Analytical Ranking Functions

There are times when the expressive power of the SQL Server Windowing or Analytical Ranking functions is almost breathtaking. Itzik Ben-Gan observed: “The concept the Over clause represents is profound, and in my eyes this clause is the single most powerful feature in the standard SQL language.”

The other day, I solved a potentially complex query problem in an elegant manner using some of the newer SQL Server functions including row_number, dense_rank, sum, and lag. Before basking in the glow of this remarkable query, let’s take a brief tour of the some of these functions. What I hope to instill here is some familiarity with these functions and some potentially unexpected uses for them, so that when you are developing a query, you may find situations where they provide a simpler and more performant result.

Perhaps the simplest to understand, and certainly one of the most frequently used windowing functions is row_number. It first appeared in SQL Server 2005, and I end up finding a use for it in almost all my non-trivial queries. Conceptually it does two things. First, it partitions the result set based on the values in zero to many columns defined in the query. Second, it returns a numeric sequence number for each row within each subset from the partition, based on ordering criterial defined in the query.

If no partitioning clause is present, the entire result set is treated as a single partition. But the power of the function really shows when multiple partitions are needed. My most frequent use of sequence numbers in multiple partitions is to find the last item on a list, frequently the ordering criteria is time and the query finds the most recent item within a partition. Examples of this are: getting an employee’s current position, and getting the most recent shipment for an order.

The partition definition is given with a list of columns, and semantically the values in the columns are ‘and-ed’ together or combined using a set intersection operation—that is, a partition consists of the subset of query rows where all partition columns contain the same value.

The ordering criteria consists of columns that are not part of the partition criteria. The ordering for each column can be defined as ascending or descending. If the values in the columns defined for the ordering criteria, when ‘and-ed’ together do not yield unique values for all rows within a partition, row numbers are assigned arbitrarily to rows with duplicate values. To make the results deterministic, that is, yield the same result for each query execution, it is necessary to include additional columns in the ordering clause to ensure uniqueness. Such extra columns are referred to as ‘tie-breakers’. One reliable ‘uniqueifier’ is an identity column, if the table has one. In the example below, I show an imaginary employee database and create row numbers that show both the first and last position per employee.

As in the example, I often generate the row number within a Common Table Expression (CTE), and refer to it in subsequent queries.

Among the ranking functions, second in frequency of use when I am query-writing is the dense_rank function (although rank could be used as well). I used to think that if I wasn’t writing queries for a school calculating class rank, I had no use for the ranking functions. The general power of this function became apparent to me when I began to see other query problems in terms of ranking. For instance, as a means of assigning numbers to partitions of a set, and then using those numbers as unique identifiers for each partition.

I will note that using the result of an arithmetic function as an identifier is a not immediately intuitive concept that can really generalize the power of the windowing functions.

Rank is defined as the cardinality of the set of lesser values plus one. Dense rank is the cardinality of the set of distinct lesser values. When using these values as identifiers, either function will work—I prefer dense rank for perhaps no reason other than the aesthetic value of seeing the values increase sequentially. While these definitions are mathematically precise, I believe looking at an example query result will make the difference between the functions intuitively clear.

I found the syntax of the ranking functions confusing initially because I was using the rank to logically partition query results, but the partitioning criteria for this in the order by clause rather than a partition clause. The ranking functions do provide a partition by clause, as with row_number, whereby the ranking would be within each defined partition.

Analogous to creating sequential row numbers within a partition is the ability add a Partition by and Over By clause to the Sum aggregate, creating a running total. In fact, summing the constant value 1 for will yield a result identical to row_number. This capability is essential to solving the query problem solved in the second example. Though not a part of this query, when a partition clause is used for Sum, but not an ordering clause, each row of the result set contains a total for the partition which is useful for calculating percent of total for each row.

Without getting into details, the SQL Server Development Team implemented these functions such that they are generally far more performant than alternate ways of getting the same result using, which often involves correlated sub-queries. I view them, in some respects, as ‘in line subqueries’.

A short example demonstrating these functions is shown below. Let’s talk about the data for the example. We have a table containing manufacturing steps for product orders. A given order is specified uniquely by the 3-tuple of order number, sequence number, and division.

Each order in this table lists manufacturing steps involved in preparing the order for sale. Each step is uniquely specified within the order with an op­eration number, an arbitrary number, the sequence of which matches the order the manufacturing operations are to be performed. I have included an operation description for each operation simply to give an idea of what said operations would be like in this fictitious company. In the example, I used some coloring to visually indicate how the sample data is partitioned based on a combination of column values.

RankingFunctionsExample1

Given data organized as above, there is a re­quest to partition the processing steps for an order such that all operations sequentially performed at a work center are grouped together. Said groupings will be referred to as Operation Sequences. To better demonstrate boundary conditions, I have added a bit more data to the table for the second example.

One potential use for such Operation Sequences would be to sum up the time an order spends at each workstation.

The first step in this approach is to identify which Operations involve the work-in-progress arriving at a new workstation. In the unlikely event that one order ends at a given workstation and the next order starts at that same one, we need to identify changes in Order Id as well. To do this, the Lag function, introduced in SQL Server 2012, provides a compact approach.

By emitting a one for each changed row, a running total, using the Sum function with the over clause, yields a unique identifier for each Operation Se­quence.

RankingFunctionsExample2

For a fuller treatment of the Ranking/Windowing functions, I recommend Itzik Ben-Gan’s book SQL Server 2012 T-SQL using Windowing Functions. If you want to shorten your queries and speed them up, I recommend you get comfortable with the Ranking/Windowing functions, and begin to tap their enormous potential.