The content in this blog is outdated and we cannot reliably say it is still accurate with the speed in which the cloud industry moves. But don’t worry—below are more recent, up-to-date blogs.
I still remember the feeling of responsibility that came with making my first $200K+ reserved instance (RI) purchase with Amazon Web Services. Lots of questions and concerns ran through my mind:
- What if there is a more effective instance type for some of the workloads?
- What if we decide to change our strategy around allocating instances to availability zones?
- What if we find software optimizations that would allow us to run substantially less infrastructure than projected?
- What if our analysis methodology for the purchase was flawed?
- How will I be able to effectively manage usage of the reservations after purchase?
When I made that first big purchase, there was little to no information on best practices or support available. Those of us that were early adopters had to learn the hard way. Over the next few years I developed an evaluation methodology and set of tools to help solve the RI management issues that so many of us had struggled with early on.
Here are 5 tips that helped me better manage reservation purchases over the years.
#1 - Monitor for Underutilization
I was talking via Google Hangout last week with the CTO / founder of a fast growing Silicon Valley startup who was struggling to find time to police his growing inventory of RIs. I told him that just as you would not expect to manually log into all your instances to check their health, you should also not expect to manually check your reservation underutilization. So the first step to improve RI management is to have an automated way to proactively identify underutilization.
You have two choices: invest in a commercial product, or invest in internal development to write custom scripts (unless this is your core business, probably not the best use of your resources).
One example of a proactive monitor is illustrated in the CloudHealth example below. This alert notifies me when we have underutilized reserved instances. This monitor can be configured to alert based on either total unused hour or lost cost savings, and can be customized for specific business groups (e.g. only apply to Marketing), or specific reservation types (e.g. only monitor heavy c3.xlarge instances).
#2 - Schedule Modifications
Amazon introduced substantial new flexibility in reservations last fall when they released support for modifying reserved instances. Prior to this, the only remediation for underutilization was to reallocate existing instances to different zones, or to sell the reservations on the marketplace at a loss due. But with the modification features, you can now change critical attributes about your reservation at no cost, including instance type within a family, availability zone and VPC.
To perform an analysis of usage, it is essential to work with hourly data, such as that provided by the AWS Detailed Billing Record (DBR). To identify the right moves to make, correlate the instance usage to reservation utilization to produce a list of underutilized reservations. You then need to evaluate each underutilization to determine if there is a move that can be made to optimize the cost and/or guaranteed capacity of the reservations. The frequency of this analysis will vary by organization, but I typically recommend doing this on a daily or weekly basis.
Below is a sample report I run when I receive an alert. This CloudHealth report performs the above analysis for me and recommends the specific moves that optimize our usage of reservations.
#3 - Push Critical Information to Key Stakeholders
Cloud infrastructure can change rapidly, and providing regular information to stakeholders is a good way to increase transparency, raise your organization’s cost awareness (a.k.a. cost IQ), and to leverage the broader team in the mission to effectively utilize cloud infrastructure.
If you don’t consistently email or message your stakeholders on the health of your RI usage, you should prioritize this. You can either invest in a product that includes this functionality, or write custom scripts to do this yourself.
At a minimum, I suggest including the following key information:
- How much have I saved so far this month from my RI purchases?
- How much have I saved so far by business group (e.g. production, development) from my RI purchases?
- Do I have any underutilized reservations (and if so, how much are they underutilized)?
- Are there specific instances whose cost I should be considering optimizing with an RI purchase?
- Are there reservation modifications I can make to better utilize my existing RIs?
- Are there historical instance usage trends I need to be aware of?
Below is a sample report I receive that gives me the information I need related to the health of my RI utilization. This is scheduled for delivery via email weekly.
#4 - Make Smart RI Purchases
Making a well-informed reserved instance purchase requires taking into account many variables including current / future instance usage, existing reservations, upcoming expirations, changes to Amazon instance type families, modifications to your workloads, and much more. A well optimized RI purchase will maximize the cost and/or guarantee capacity for all utilization, as well as make recommendations for the right mix of heavy, medium and light reservations based on your available budget. Amazon makes a rich set of data available to perform this analysis, and there are a few open source tools that can provide some help here. But whatever you use, the quality of the data is critical.
Below is one sample report for reserved instance optimization. This is a critical feature that lets me specify an estimated purchase date, a period of time to analyze, and a budget, and provides me precise recommendations of how to optimize my spend. It also includes enterprise features, such as support for multiple consolidated billing accounts, ability to analyze to specific business groups (e.g. optimize compute usage for RIs for product line A in Europe), and support for enterprise pricing.
#5 - Leverage Analytics
No matter how well you follow the previous steps, there will be times where you need to take a deep dive into really detailed data to understand usage. This is where I learned that the internal solution we had cobbled together in my previous company didn’t make the cut. It was time intensive and almost always out of date due to the speed of innovation from AWS. My advice is purchase a commercial solution.
Some of the typical questions that need an answer include:
- How many hours did a specific instance run during a billing period?
- How many of these hours were run under a reservation?
- What is a breakout of reservation usage by groups (e.g. autoscaling cluster) on an hourly basis?
- What instance usage could best utilize the burstable T2 instance types?
- ...and the list goes on
Amazon has developed an extremely powerful feature with reserved instances, providing both a cost benefit and guaranteed capacity. But maximizing the benefits requires careful stewardship of your cloud infrastructure and reservation inventory.
Hopefully, these 5 tips give you a good start toward optimizing your RIs and the overall efficiency of your cloud infrastructure.