Have you discovered assets that were either over- or under-utilized? Did you then take action to remediate the issue? If your answer is yes, then congrats, you rightsized your infrastructure! Rightsizing is a simple three step process:
1. Analyzing the utilization and performance metrics of your infrastructure, such as instances, volumes, and virtual machines (VMs).
2. Determining whether or not they are running efficiently, and what actions you should take to improve efficiency.
3. Modifying the infrastructure as needed (upgrading, downgrading, terminating).
So what are the metrics you need to consider when rightsizing? On the compute side, the core utilization metrics to take into account are CPU, network, disk, and memory. It is a best practice to have pre-defined thresholds for what constitutes normal behavior for your infrastructure. For example, if CPU is less than 20% then you deem that asset as underutilized. As you are defining what the most effective utilization is for your infrastructure, take into account the maximum, average, and minimum values of these metrics. By reviewing these performance scores over a specific time period, you can decide on an appropriate action to take.
As you start this process, you may begin to wonder, ”how did my infrastructure get out of control?” It’s actually very common to have over- or under-utilized assets, whether it was unintentional, or the asset was intentionally spun up in a larger size for extra headroom. Naturally, I suspect you don’t want to pay for infrastructure that you are not using effectively, or at all, which is where rightsizing comes into play.
Benefits of Rightsizing
So why should I rightsize? The two key benefits of rightsizing are infrastructure optimization and reduced costs. Throughout your analysis you will come across assets that can be downsized or terminated to save money, or upgraded to improve performance.
When assets have low utilization for core performance metrics, such as 20% or less, that often means the asset is underutilized. In this case, the best practice is to downgrade the asset to a smaller footprint. For example, in AWS if you are running a workload on a r3.2xlarge, but determine via rightsizing that you could downgrade the instance type to a r3.xlarge instance without negatively impacting the workload, you can cut your operating costs by 50%.
It is a best practice to terminate assets that are considered ‘Zombies.’ These are assets that are running in your cloud environment but not being used. Zombies occur when someone may have forgotten to turn the assets off, or the asset failed because of script errors. Regardless of the cause of the zombies, your cloud provider will continue charging for these assets because they are in a running state. By finding these assets and terminating them, you can reduce costs.
While downgrading and terminating assets will meet both benefits of optimization and reducing costs, upgrading on the other hand will cause an increase in spend. However, by upgrading you will ensure that your assets are able to meet surges in demand. For example, in Azure if you have a Standard_A2 VM that has usage spikes that consistently hits 100% utilization of CPU or memory during certain times of the day. You want to analyze the hourly max utilization throughout the day and see if the VM requires a larger size, such as the Standard_A3, or maybe the new burstable B-Series, in order to optimize performance.