The content in this blog is outdated and we cannot reliably say it is still accurate with the speed in which the cloud industry moves. But don’t worry—below are more recent, up-to-date blogs.
by Dick Wallace
You have multiple workloads running on a substantial number of AWS instances. You've done a good job of using Reserved Instances to optimize your costs and are now wondering "How well am I actually utilizing all those instances I have running"?
The simple way to answer this question is to rank your instances using some set of utilization metrics. Then, look more closely at those most underutilized. Sounds easy, right? Yes and no.
A typical cloud-based application likely runs a number of workloads on a variety of instances with differing capacities. If you use the same metrics to rank them all, your lowest to highest sorted list would not necessarily produce a useful result due to what it really means to be 'underutilized'.
For example, you might consider an instance running a compute-intensive workload to be underutilized if the average CPU utilization is < 50%. On the other hand, 20% CPU utilization on another instance might be well utilized because its workload does not require much CPU. Furthermore, for some workloads CPU matters most. For others, perhaps it is memory or disk that is the more important resource.
Therefore, the simple approach produces too many false positives that need to be constantly ignored.
Solving the Problem
I would submit that there are six parts to a sufficient solution to this problem - metrics, normalized ranking, configuration, grouping, computation, and data display. Put them all together and you're on your way.
It starts by collecting performance data, such as CPU utilization from CloudWatch. Then add memory and disk utilization, which can be obtained with an agent or product like Chef. Store some amount of history including associated instance IDs and you have the basis for computing rankings.
Ultimately, there needs a way to compare instances with each other, such as with a rank or score. Because workloads use resources differently, you can't just use metric utilization values directly. So, each metric will need a function that maps actual metric values to a corresponding 'normalized' score using configuration (e.g. range thresholds) specific to the instance's workload.
Once you have individual scores for each metric, you can combine them to create a total score. Again, since workloads differ, using a configuration-based weighted mean would allow CPU be the dominant score for one instance and memory for another. When ranked based on total score, their overall position would reflect those differences much better than a simple average all individual scores.
Configuration and Grouping
Having the ability to configure the scoring parameters (thresholds and weights) based on instance workloads provides the basis of actually computing scores. This configuration includes what workloads and instance types are associated with each instance ID and the scoring parameters for those workloads. You could handle large numbers of instances by grouping instances by workload type (perhaps using tags) and then associating scoring parameters with those groups.
Now that you have configuration and a set of scoring algorithms, simply use the groups to iterate over instances, lookup parameters for each and then calculate (and potentially store) the scores. Voilà! You have what you need to rank instances - either by total score or any of the individual metric scores.
Finally, add a nice web interface to view, sort and filter your data. Link it back to your instances in the AWS Console and you now have a means of finding underutilized instances, investigating the ones in most need of attention and then making decisions about whether to change instance types, combine or simply terminate them. Wouldn't it be nice if you also had an automated workflow that would execute those actions for you? Good thing CloudHealth can do that!
In the end, it's not rocket science. But it's not a weekend project either. If you would like to try a product that brings everything together, does all of these things and simplifies the process learn more about CloudHealth or sign up for a free trial.