Of limited functionality legacy code and virtualization

One thing I am noticing with virtualization is that from a fully virtualized physical environment, the boundary between the hardware and the virtualized object is moving to higher levels of software abstraction(vm/host boundary) as the technology advances. For instance, a step preliminary to this is an announced version of windows server 2016 that is skinnied way down, and comes with only the drivers for virtual hosted devices, and no GUI, greatly reducing the disk footprint, and to some degree the memory footprint. Microsoft claims that this sort of server would require perhaps 1/10 of the patching and rebooting of the full server version.

This sort of thing brings to mind an evolution whereby all windows server instances share this common set of drivers on disk and memory and in doing so, moving the VM/host boundary a bit away from the hardware. Doubtless many similar opportunities exist to perform other optimizations, further increasing the abstraction of the host machine until its something more like a container than a physical machine.

(I noted that VMware, even today, recognizes which parts of Windows Server memory are common among instances and keeps only one copy of said memory to share among virtual machines. This sort of optimization clearly points the way toward a more formal optimization in the Windows Server architecture).

 I thought about this when reading the article below, which discusses Microsoft responding to the ‘docker’ concept–a higher level of virtualization abstraction. To my thinking, the whole server multiplication process, whereby each application/function required a new Windows server, was due to limitations of the windows implementation such as a single registry per computer and inadequate isolation between processes—many versions of Windows server ago. Absent these limitations, a single Windows instance could have run multiple applications without the need for the boundaries of separate computers—performing the resource sharing that virtualization would provide.

 Given the universality of the installed base of Windows applications, its mostly too late to re-architect this installed code base with improved versions of Windows Server, hence the need for a virtual machine to support the installed base and do the optimizations under the covers. But based on this article, and others, it looks to me like the number and success of virtualized servers is leading Microsoft to evolve the application API a bit, leading to more efficient virtualization.

In any case I commend you to this article as you ponder the future of Windows virtualization. 



What is a Server?

I was surprised to find myself struggling with this concept a bit the other day, since I have spent many years around servers, and have some running at home as well. There’s kind of a feelings answer and a precise answer. Let’s start with the feelings one. A server:

  • Is rack mountable and shaped like a pizza box.
  • Is the kind of computer they use at work in the server room.
  • Has redundant things like dual power supplies and NICs.
  • Has a raid card with battery powered cache.
  • Has a Xeon CPU.
  • And so on…

While these sort of definitions are attributes of many, perhaps most, servers, I realized they were inadequate, as they really do not get at the functional aspect of servers–the reason they get purchased.

What got me really thinking about a specific definition of a server was the availability of versions of Windows Server (and Linux from day 1) that do not have support for a Graphical User Interface. So, I got to thinking about the notion of a desktop computer being primarily a visual server for the user–either GUI or character based as in the heyday of the DOS based personal computer.

And by sort of considering the inverse of this, I conceived a more precise definition of a server as being a computer that listens on a given IP address and port and responds to queries or service requests sent in from the network. This helped me understand why the Apple Server software is simply an add-on that provides various TCP accessible services without the need for a distinct operating system. Needless to say, by this definition a desktop computer can act as a server as well.

Naturally, all desktop computers act as servers in at least a minor respect, such as being an ICMP server and responding to pings. But obviously its more significant protocols that we traditionally think about when setting up a server: DHCP, FTP, Database, SMB, NFS, AFP, Web, DNS and the 600 pound gorilla of large organization services–utilizing a variety of ports and protocols–Active Directory. Any useful server probably ought to have remote administration capability by providing a Telnet or SSH service, though such service is not typically considered useful beyond administering a server. To the extent that a server provides a visual server–a GUI interface, it is extremely helpful if said server can operate well in “headless” mode; not requiring a physically attached keyboard, mouse, and monitor.

Beyond servers simply responding to network traffic, I will to add another class of server to my definition–kind of a server server–the Virtual Machine Host. Revolutionary in its ability to blur the lines between hardware and software, and bring economies of administration and hardware to organizations.

Just some thoughts I found interesting. I was surprised to find my thinking as fuzzy as  it was on these concepts.

Adventures in NAS-Land: Freenas

A trite narrative device that seems obligatory for every police procedural tv show I watch is the opening scene where the protagonist is in great peril, seemingly with no chance of escape, and then the scene cuts to a scene labelled “36 hours earlier”. Cut to the opening music and credits, and in the fullness of time we see the protagonist easily escape from said peril and turn the tables on the bad guys. Alfred Hitchcock pioneered this narrative device to create suspense in the 1930s, and presumably its novelty and suspense generating powers didn’t become tiresome until the 1940s. Having said all that, I thought perhaps it wouldn’t be so trite if I applied it to network attached storage. 😉

Cue suspense music: “I powered down the NAS using its web-based administrative interface and installed the new disk drives. This involved plugging all of the drives into different SATA slots. When the time came to power it back on, I didn’t know whether Freenas would be able to find the mirrored pair on its new sata ports, whether it would be able to bring the pair back online, whether it would lose my data…”

36 hours earlier:

I had been running my Freenas based NAS for a week or so, and decided I needed to upgrade the disk drives. It was running a mirrored pair of 1.5 TB drives that were manufactured in 2009, and aside from the potential for running out of space, I had been wringing my hands quite a bit about the age of the drives. I picked up a pair of 4 TB drives and commenced the upgrade. When I first set up the NAS, I had plugged the two 1.5 TB drives into the two 6 TB ports on the motherboard. After doing some reading, I learned that no spinning platter drive is going to find 3 GB SATA a bottleneck and only solid state storage will be able to take advantage of the 6 GB ports. So, I decided that when I upgraded the disks, I would use the 4 3 GB sata ports for the spinning platter drives.

I will note that while I believed the case and motherboards to be excellent choices when I purchased them, when put together the multiple power supply cables, SATA sockets on the motherboard, and placement of the 3.5 inch drive bays lead to an extreme and undesirable degree of crowding in the sata socket area.

After booting the NAS back up, I was not able to contact it through the web based administrative port. As it was a headless server, I had to bring it upstairs and plug a keyboard and monitor in to see what was going on. I was specifically wondering if the sata cable moving had confused Freenas as to the location of the solid state sata boot drive. It turned out, the NAS had booted properly and I had simply plugged the ethernet cable into the wrong RJ45 port (the motherboard has 3 RJ45 ethernet sockets). Once I tried a different port, the administrative web page appeared in all its glory.

At this time, I noted that Freenas had sent me an email indicating the 1.5 TB mirrored volume was missing one drive and running in a ‘degraded’ state. I also noted the ‘alert’ button in the upper right hand side of the web page was showing red rather than its usual green, and I was able to determine that the mirrored pair was missing one if its drives. I checked and I was able to access the CIFS share from this volume, so I was impressed.

As is my normal practice in these matters, I had the case open and I upon examination, I saw that during the plugging in of new drives and wrestling with the power supply cables hanging over the sata sockets, the power plug on one of the disk drives had become unplugged. Without turning off the PC, I plugged it back in. Freenas reported that the second disk was present and the volume was no longer running in a degraded state. Since there had been no updates to the volume while it was running in single disk mode, there was no syncing up process to be performed. I was really impressed.

Freenas can be frustrating to work with, mostly when one is deep in the weeds trying to connect to the shares from other computers and wrestling with permissions issues. I had been toying with the idea of converting the computer over to a virtual host and running some sort of linux variant as the NAS host. After seeing Freenas gracefully handle a missing disk drive, I decided that it would be better to stick with it. Admittedly I didn’t test its ability to put in a fresh drive and watch Freenas copy the data over to make it part of the mirrored pair, but I will take that on faith.

Whence Freenas?

My objectives when setting up a computer as a NAS for my home network included raid-like protection from a single disk failure, and the ability to expose CIFS (a fancy re-branding of the good old Microsoft SMB), NFS (The unix file sharing protocol), and iSCSI. While not on my wish list, Freenas also provides AFS (Apple File Sharing) which enables the device to be used as a Time Machine repository–just remember to set a quota lest Time Machine consumes your entire volume.

I did a fair amount of reading and tried a few different open source (free) NAS programs. I would install them and once I encountered either too many hassles setting them up, or they didn’t work properly, I discarded them. Sorry, but I don’t recall any details except that Nas4Free simply would not boot on my pc.

Another factor in favor of Freenas is that it supports ZFS, a journaling file system developed by Sun which is a software “raid” type filesystem that maintains checkdigit type hashes of files and directories whereby a worker process can “scrub” the drives and upon identifying corruption in the form of a file no longer matching its hash, the worker process will replace the data from another drive provided its hash indicates it is not corrupted. Enthusiasts claim this is better than raid due to its ability to detect and cure corruption due to disk errors. All this is not free, however, as all of that magic slows down the performance of the NAS.

Another nice thing about Freenas is that the administrative web page allows one to check for update packages and install them. The updates I applied did not incur any downtime–I don’t know if sometimes updates require a re-boot.

The Freenas administrative interface is nice in that it is entirely web based and works on Safari and Internet Explorer. In addition to managing volumes, shares, and the operational aspects of a NAS, it provides a view into the busyness/use of the system components like CPU, RAM, Disk, and Network interface. While running a 600 GB data transfer from the old volume to the new, a task which took over 8 hours, I noted the CPU usage rarely exceeded 20%, so I believe I bought too much CPU for the task and will look into BIOS settings to see if the CPU can be slowed down a bit as a power-saver during times of no NAS usage.

The drawback in my mind, to Freenas, comes when one is having difficulty getting a client to connect to one of the shares. When attempting to connect to a NFS share, the VMware Free iSXI virtual host would return a cryptic “cannot connect” message, and lacking the knowledge to drill into operating system logs and the like, I either failed to get a connection working, or used much internet surfing and trial and error to get things going. Folks on the web anecdotally note that permissions are often the problems of connection issues, and I had more success once I learned how to enable guest access and the like. For my home network/lab this is not a concern. To be fair, a lot of this can be laid at the feet of the folks writing the client software for not providing better diagnostics for connection problems.

So, in conclusion, I found Freenas to be free software that provides enterprise quality NAS features, multiple sharing protocols, and a convenient web based interface that enables headless server operation. There may be better packages out there, but this one worked for me and I am going to stick with it for the time being. My impression, from web surfing, is that Freenas isn’t too picky about which hardware devices it supports (unlike Vmware iSXI), but I will list my hardware for those interested. In an effort to maintain data integrity, I chose hardware that supports ECC (error correcting) memory that is normally used only on servers. It turns out that Intel 4th generation Core i3 processors support ECC, so I used one of those. As I said, one could probably find one with half of the horsepower and do just as well with their NAS.

2X: HGST DeskStar 4TB 7,200 RPM SATA III 6.0Gb/s 3.5″ NAS Internal Hard Drive. I spent much time looking at reviews on websites like Amazon and Newegg to get a better idea of the reliability of different drive models. This one seemed to have less ‘disk failed’ reviews and was in stock at Microcenter, so that is what I went with. Running on freenas as a mirrored pair, I have a slightly less than 4 TB volume. This should cover my needs for quite some time.

Intel Core i3 3.7 GHZ LGA 1150. Nothing scientific in my methods here. I just looked at benchmarks and tried to find a good price/performance compromise. I chose this rather than a Xeon which also supports ECC memory but is more expensive per unit of performance (to the best of my knowledge).

Supermicro X10-SLL-F LGA 1150 Intel mATX motherboard. Nothing magic here, its just what Microcenter had in stock for LGA 1150 and ECC. One nice surprise is the fact that it has a little cpu on the side of the main one that exposes and administrative web page for the motherboard–one can look at the console, install BIOS upgrades, and monitor device temperatures if one wants. A very useful feature for one running a headless server configuration. As I said, the placement of the USB sockets was inconvenient, but I chalk that up to me buying a case with too short of a distance between the hard drive bay and the motherboard.

EVGA 500B Bronze 500 watt power supply. While my setup does not need 500 watts, the reading I have done indicates that these sort of switching power supplies are most efficient when running at about 1/2 rated power. The “Bronze” designation indicates it is 80% efficient–so easier on the electric bill. From what I have read and seen in my career, all power supplies are not created equal, and it is prudent to spend a bit more than the cheapest one to avoid problems in the future.

NZXT Classic Series Source 210 Mid Tower ATX computer case: As I said, this one got a bit crowded if one wants to use the higher up 3.5 inch slots which can abut the SATA socket block. To address this, since I am using only two hard drives, I will place them lower in the case to keep them out of this crowded area. Also, the case has its power supply at the bottom rather than the top. That leads to really great cable management and air flow for the motherboard, but as I said, the cables tend to pass over the SATA socket area. I wish it had a cross bar I could zip tie the cables to and keep them away from the hard drives. The Intel CPU fan doesn’t have much of a cage on it, and once a stray cable touches a fan blade and stops it, the CPU will overheat and one is out of business at that point. The case has one built in fan on the top of the back, and spaces for many more fans to be installed. I would like a fan in the front of the case to vent the disk drives better, but I don’t think they will have any trouble without one as the case is spacious and the drives are mounted about an inch and a half apart.


SQL Server Analytical Ranking Functions

There are times when the expressive power of the SQL Server Windowing or Analytical Ranking functions is almost breathtaking. Itzik Ben-Gan observed: “The concept the Over clause represents is profound, and in my eyes this clause is the single most powerful feature in the standard SQL language.”

The other day, I solved a potentially complex query problem in an elegant manner using some of the newer SQL Server functions including row_number, dense_rank, sum, and lag. Before basking in the glow of this remarkable query, let’s take a brief tour of the some of these functions. What I hope to instill here is some familiarity with these functions and some potentially unexpected uses for them, so that when you are developing a query, you may find situations where they provide a simpler and more performant result.

Perhaps the simplest to understand, and certainly one of the most frequently used windowing functions is row_number. It first appeared in SQL Server 2005, and I end up finding a use for it in almost all my non-trivial queries. Conceptually it does two things. First, it partitions the result set based on the values in zero to many columns defined in the query. Second, it returns a numeric sequence number for each row within each subset from the partition, based on ordering criterial defined in the query.

If no partitioning clause is present, the entire result set is treated as a single partition. But the power of the function really shows when multiple partitions are needed. My most frequent use of sequence numbers in multiple partitions is to find the last item on a list, frequently the ordering criteria is time and the query finds the most recent item within a partition. Examples of this are: getting an employee’s current position, and getting the most recent shipment for an order.

The partition definition is given with a list of columns, and semantically the values in the columns are ‘and-ed’ together or combined using a set intersection operation—that is, a partition consists of the subset of query rows where all partition columns contain the same value.

The ordering criteria consists of columns that are not part of the partition criteria. The ordering for each column can be defined as ascending or descending. If the values in the columns defined for the ordering criteria, when ‘and-ed’ together do not yield unique values for all rows within a partition, row numbers are assigned arbitrarily to rows with duplicate values. To make the results deterministic, that is, yield the same result for each query execution, it is necessary to include additional columns in the ordering clause to ensure uniqueness. Such extra columns are referred to as ‘tie-breakers’. One reliable ‘uniqueifier’ is an identity column, if the table has one. In the example below, I show an imaginary employee database and create row numbers that show both the first and last position per employee.

As in the example, I often generate the row number within a Common Table Expression (CTE), and refer to it in subsequent queries.

Among the ranking functions, second in frequency of use when I am query-writing is the dense_rank function (although rank could be used as well). I used to think that if I wasn’t writing queries for a school calculating class rank, I had no use for the ranking functions. The general power of this function became apparent to me when I began to see other query problems in terms of ranking. For instance, as a means of assigning numbers to partitions of a set, and then using those numbers as unique identifiers for each partition.

I will note that using the result of an arithmetic function as an identifier is a not immediately intuitive concept that can really generalize the power of the windowing functions.

Rank is defined as the cardinality of the set of lesser values plus one. Dense rank is the cardinality of the set of distinct lesser values. When using these values as identifiers, either function will work—I prefer dense rank for perhaps no reason other than the aesthetic value of seeing the values increase sequentially. While these definitions are mathematically precise, I believe looking at an example query result will make the difference between the functions intuitively clear.

I found the syntax of the ranking functions confusing initially because I was using the rank to logically partition query results, but the partitioning criteria for this in the order by clause rather than a partition clause. The ranking functions do provide a partition by clause, as with row_number, whereby the ranking would be within each defined partition.

Analogous to creating sequential row numbers within a partition is the ability add a Partition by and Over By clause to the Sum aggregate, creating a running total. In fact, summing the constant value 1 for will yield a result identical to row_number. This capability is essential to solving the query problem solved in the second example. Though not a part of this query, when a partition clause is used for Sum, but not an ordering clause, each row of the result set contains a total for the partition which is useful for calculating percent of total for each row.

Without getting into details, the SQL Server Development Team implemented these functions such that they are generally far more performant than alternate ways of getting the same result using, which often involves correlated sub-queries. I view them, in some respects, as ‘in line subqueries’.

A short example demonstrating these functions is shown below. Let’s talk about the data for the example. We have a table containing manufacturing steps for product orders. A given order is specified uniquely by the 3-tuple of order number, sequence number, and division.

Each order in this table lists manufacturing steps involved in preparing the order for sale. Each step is uniquely specified within the order with an op­eration number, an arbitrary number, the sequence of which matches the order the manufacturing operations are to be performed. I have included an operation description for each operation simply to give an idea of what said operations would be like in this fictitious company. In the example, I used some coloring to visually indicate how the sample data is partitioned based on a combination of column values.


Given data organized as above, there is a re­quest to partition the processing steps for an order such that all operations sequentially performed at a work center are grouped together. Said groupings will be referred to as Operation Sequences. To better demonstrate boundary conditions, I have added a bit more data to the table for the second example.

One potential use for such Operation Sequences would be to sum up the time an order spends at each workstation.

The first step in this approach is to identify which Operations involve the work-in-progress arriving at a new workstation. In the unlikely event that one order ends at a given workstation and the next order starts at that same one, we need to identify changes in Order Id as well. To do this, the Lag function, introduced in SQL Server 2012, provides a compact approach.

By emitting a one for each changed row, a running total, using the Sum function with the over clause, yields a unique identifier for each Operation Se­quence.


For a fuller treatment of the Ranking/Windowing functions, I recommend Itzik Ben-Gan’s book SQL Server 2012 T-SQL using Windowing Functions. If you want to shorten your queries and speed them up, I recommend you get comfortable with the Ranking/Windowing functions, and begin to tap their enormous potential.

Of Trojan Horses, chatty devices, and Wireshark, part 1

One of the things that bugs me is wondering if at night some trojan horse on my mac is sending all my data to some Eastern Bloc hacker while I am sleeping. It strikes me that the best tool to look into this is the free network sniffer program, Wireshark. I also got a managed gigabit switch with port mirroring ($130) so I can see that the other computers in the home network are sending out into the world when we aren’t looking.

My router/firewall is configured for stealth mode, which means it will not respond to pings from outside, and it has stateful packet inspection. I am pretty confident that nobody can get in from the internet, except maybe for some back-door that the federal government secretly forced vendors to install on all routers… The only internet traffic allowed in the firewall is that which is part of an exchange initiated from the local network.

So I bring up Wireshark. First thing I notice is that the router is constantly broadcasting arps for non-existent IP addresses on the local network. I am wondering if this is the result of some internet traffic getting in, or maybe a bug in the router program. Also I wonder what sort of arp cache it has as it arps existing addresses every second or so as well. There is no router setting about how long the arp cache entries are kept, so nothing to be done.

I get a new router, and this seems a bit less chatty, at least in the arp department. Then I notice the router is emitting STP broadcasts. How cute, my 4 port switch on the back of the router wants to have a spanning tree protocol root bridge election. Just for the fun of it, I log into the managed switch and enable STP. By the time I get back to the Wireshark window, the STP broadcasts have stopped, so the switches must have already had their election.

Now the mac (mavericks) is constantly emitting broadcasts for things like network printers and other discovery protocols. As is too often the case with OSX, there is no user interface to shut these things off, and one has to fire up VI and turn off some daemons. Not a priority right now and I am not eager to delve into FreeBSD network configuration.

At this point, I am looking at a well behaved mac, chatty with its broadcasts, but nothing leaving the lan. Then I bring up Safari… Now I am seeing all sorts of TCP misbehavior with the outside world. TCP conversations initiated on the mac and going to ip addresses located in Ireland and various soviet countries. Packets are highlighted in red indicating protocol misbehavior such as out of sequence acks, and uncompleted TCP handshakes. Wireshark has no way of knowing what processes running on the mac are initiating network traffic, but I notice it goes away when Safari closes. I look at Safari a bit and see some sort of uninvited Safari extension has been installed.

I purchased some backup software online, and apparently the vendor was in the hacker’s paradise in Russia and Eastern Bloc countries–I learned after I made payment. I guess installing the program, which of course required administrator privileges to install, put its own little trojan horse in my browsers. I make a note to look for other inappropriate things this install may have slipped under the covers.

Once I deleted the Safari extension, the interior of the mac appeared to transmit nothing but the usual chatty local discovery broadcasts I mentioned before. Not a bad catch for the first couple of times I cast the net for Trojan Horses…