« January 2007 | Main | April 2007 »

March 2007

March 26, 2007

Are You Virtually Compliant?

In order to help prevent technical anarchy on a grand scale, I find myself frequently pointing out to those undertaking virtualization initiatives that it is not merely a sizing exercise.  First and foremost it is an exercise in constructing a coherent infrastructure that puts the right things together and keeps the right things apart, especially when dealing with production systems.  You don't build a sophisticated IT infrastructure over many years to simply have it "randomized" into VMs based on utilization.

Some counter by asking "Why not?  If the systems are in their own VMs then what difference does it make?"  Rather than go into the details of how problematic it can be to combine systems with different availability levels, different DR strategies, etc., I find it easiest to think of things from a security perspective.

System-level security has traditionally been based on the premise that you cannot see the local storage without first authenticating with the system at the OS level (i.e. logging on).  Network-based storage authenticates at the system level and thus also prevents one from accessing it without being "on the system".  This access control is like a castle wall that protects everything inside it, and is the first line of defense in any IT security strategy.

Virtualization is a different paradigm, and as these things go, new paradigms often upset existing assumptions.  Most virtualization technologies store their VM images in files, and this encapsulation enables many of the benefits of virtualization by providing portability, state management, interoperability and a whole list of other advantages.  But it also means that those with access to these files can bypass all authentication mechanisms on that system; by having a complete file (and memory) image of a virtual machine, you can peer inside it with little effort.  The castle wall is suddenly made of very thin paper.

In fact, most VM vendors provide utilities to "mount" a VM image as a disk, providing convenient access to its contents.  This means that the old fear of someone "walking off with a disk" has now become a fear of someone "copying a file", which is considerably easier to do.  But more to the point of this topic, it means that any administrator or user with access to these images becomes a "super super" user, with access to the contents of systems that they may not otherwise be privy to.  That fact that a single user exists that can access (and modify) the data of any app on a virtual cluster is enough to kill pretty much any compliance audit.

There are, of course, ways to prevent this.  Some vendors support encryption of the VM images, although this will undoubtedly cause performance to tank.  Some provide affinity and anti-affinity rules to influence the movement of images within a pool of physical systems, but offer no practical means to establish and maintain these rules.  (Not to mention the fact that they are already in the same pool may make it too late to achieve true isolation.)

The best solution, in my opinion, is to simply avoid combining systems that should never go together.  Seems simple enough - this is what you spent the last 10 years doing, so why stop now.  And the best way to accomplish this is to establish in advance the business constraints that govern a virtualization project and ensure that these constraints are honoured throughout the process.  And then look at sizing...

March 06, 2007

Utilization v Moore's Law

A lot of organizations look to consolidation and virtualization as a way to increase utilization.  While this is certainly true, there is another very interesting way to look at it: consolidation and virtualization are necessary tools in the battle against falling utilization.

The reason?  When it comes time to refresh servers, you simply cannot buy gear that is nearly as slow as the stuff you are replacing.  Moore's Law dictates that the number of transistors on a chip will double every 18 months, which roughly translates into the doubling of compute speed.  This has certain limits, and lately we have seen a shift away from increasing speeds and toward "multi-core" strategies, but the net effect is roughly the same.

This means that on a 3 year lease cycle you will be replacing systems with ones that are roughly 4 times faster.  The increase in app response is no doubt appreciated by end users, but from a utilization perspective this means that the utilization of these systems drops significantly with every hardware refresh. 

When viewed on aggregate this has interesting implications.  If we approximate Moore's Law to say that speeds increase by about 58% per year, and we consider a pool of 300 servers that have been purchased over 3 years, then the aggregate compute capacity would be:

100+158+252 = 510 units of power (realtive to the power of the year 1 servers)

If on the 4th year you refresh the first 100 with servers that are 4x more powerful then the pool becomes:

158+252+400 = 810 units of power

This is, interestingly, a 58% increase in the compute capacity over the previous year, meaning that upgrading only a portion of a server pool will still track to Moore's law on the larger set.

More importantly, however, it means that you need to achieve at least a 4:1 consolidation ratio on any new servers just to keep your utilization levels stable (ignoring application growth).  If you are replacing 6 year old servers then you need a 16:1 ratio, and for 9 year old gear you would have to achieve a 64:1 ratio just to keep things the same.  Needless to say this can become quite challenging.

The upshot of all this is that doing nothing will typically cause utilization levels to drop over time, and the single-digit utilization levels that many organizations experience (particularly on Wintel servers) are undoubtedly contributed to by this effect.  It also means that anyone looking to use consolidation or virtualization to increase utilization levels better be very careful about the expectations that they set.  The incessant shrinkage of transistors is working against you...