Monday, February 6, 2012

how to control and measure resource consumption on a modern server farm?

At the big company where I work, there is an ongoing tussle for resources, some of it controlled by quotas directed from above, and some of it not. The quotas that get handed down are in terms of numbers of hosts or VMs, but this doesn't work well in a truly fluid server farm environment like that provided by VMware's LabManager product. There you can consume VMs or disk or RAM or CPUs temporarily, and the value of these resources changes over time as new capacity is delivered to the data center in various configurations. It is plain to see that a bare machine count is not adequate to measure or control resource consumption in this type of environment.

So how should one measure consumption in a world where different VMs can have very different footprints depending on how their type and how they are configured? An answer is to establish one more layer of indirection by using "points" to measure consumption of various things, according to prices which are dynamically updated to reflect the current value of these things to the organization.

At the implementation level, our lives can be simplified if we centralize record-keeping, but decouple this record-keeping from the various resource providers.

Using "points," i.e., an arbitrary unit of value, to measure consumption has several advantages.

- simple to record, share and combine:

"Points" make it very simple to distribute resources to people in a transparent way, since point holdings can be represented by integers. Dividing resources between members of a group is as simple as dividing a number. Likewise, aggregating the buying power of a set of people is as easy as summing the points held by the individuals. This makes it easy to award resources at a high level based on business priorities without getting caught up in the low-level details of which particular compute resources will really be used.

- flexibility of what is being measured

"Points" facilitate arbitrary complexity of resource pricing. The alternative is to establish quotas in terms of units of hardware or hardware use, and this can lead to awkward situations. For example, limiting users to fixed numbers of VM's doesn't distinguish between categories of VMs which may have very different cost profiles. And even if we are talking about the same type of VM, there may be other properties which make one more valuable than another, such as which data center it lives in or what virtual hardware it is provided with. If we use points, we can establish pricing on any aspect of VMs that is proving to be a precious resource.

- facilitates comparisons and prioritizing between different resources

"Points" provide a common denominator between different kinds of resources. If it proves to be the case that there is a shortage of OVM instances in ADC, but lots of VirtualBox instances in UCF, their relative prices can reflect that. As the situation changes, these prices could also be adjusted to express the relative values of these commodities within the organization.

- decouple central records from resource providers

Record-keeping must be centralized, but this needn't imply a monolithic system. Since priorities are set at the organizational level, there should only be one reckoning of the points held by each individual employee. But this doesn't mean that the individual resource providers must be combined; there just has to be an interface to the central record-keeper which allows these providers to report consumption (and download quotas).

This allows different providers to report and query on unrelated schedules. The central recordkeeper will always be aggregating reports in terms of points, but must remember the source of each piece of information to allow updates. This decoupled scheme allows the central recordkeeper to provide reports even if individual resource managers are down or cannot be contacted.

No comments:

Post a Comment