Amazon Elastic Block Store (EBS) is often the second or third highest cost when using AWS at scale. Unattended, EBS costs can become very large over time. You can use some simple rules to best manage your EBS costs, without sacrificing availability and response times.
A volume service such as EBS provides attached storage. There are four metrics that come into play. One obviously is the amount of available storage. A second is the speed, or latency of the disk, sometimes called ‘Seek Time’ - how long it takes the volume to respond to a query. A third is I/O Operations per second, or IOPS. This measures how many requests for I/O per second are being made to the volume. The fourth is throughput. How quickly can large files be read or written to a volume. With EBS Optimization, we focus on matching the optimal EBS volume based on the latency, IOPS, and throughput required.
EBS is composed of volumes of various sizes and types which provide mounted file storage for AWS Instances:
You can save a lot of money by matching the volume type to the load that it will receive and the needed speed of delivery. AWS by default provisions a relatively high cost, but very fast volume. Many I/O needs can be met with lower cost magnetic drives. There are large savings to be had moving less critical volumes to magnetic storage. Likewise, expensive io1 GB and PIOPS can often be replaced by high performing gp2 volumes, sized to meet the actual I/O required by the application.
EBS volumes are charged per GB per month. It’s worth noting that io1 is more expensive, both in the cost per GB, and in the cost of adding additional I/O, with a slightly higher SLA for sustained I/O. Gp2 has the the same seek time as io1, and you’ll notice that gp2 is four times the cost of sc1 per GB, so choosing wisely is key!
When optimizing EBS there are a few key questions to consider:
Over-provisioned PIOPS is a common occurrence, even with our most mature customers. Sometimes it is very obvious, where maximum read or write operations never approach the PIOPS, other times it is not as clear. To get the best measure of the I/O for an io1 volume, use the metric Consumed Read/Write Ops. This is the most conservative metric AWS has for io1 performance. Consumed Read/Write Ops combines reads and writes, and compensates for larger reads and writes. I/O operations that are larger than 256K are counted in 256K capacity units. For example, a 1024K I/O would count as 4 consumed IOPS.
As you can see below, Consumed Read/Write Ops trends higher, and gives you a better idea of the I/O requirements of the volume. (Request for AWS—please make this available for gp2 as well).
Using Consumed Read/Write Ops you can get a good idea of the actual need for I/O, and also how many sustained periods of high I/O a volume encounters. This leads to the second question of what size gp2 volume you should use to deliver the IOPS you need.
In most cases it’s not economical to use io1 unless a volume has absolutely critical I/O needs, or has sustained volume above 10K IOPS. Gp2 can deliver the same IOPS for a lot less.
GP2 provisions baseline IOPS at a rate of 3 per GB up to a limit of 10K. To figure out the baseline IOPS for a gp2 volume, simply multiply the GB by 3. For example, 400 GB gp2 will deliver a baseline of 1200 IOPS volume at much lower cost than a 400 GB io1 drive with 1000 PIOPS. The volume will be able to burst to higher levels until it’s ‘Burst Balance’ is exhausted and the performance reverts to baseline, but it will provide the baseline performance for as long as needed.
You can think of sc1 and st1 as workhorses, which are much less expensive than gp2. They perform well in many applications which are not I/O intense, or do not require single millisecond latency. Some rules of thumb that we use: 1) volumes with an average I/O of the larger of reads or writes less than 85, and 2) a max I/O for the larger of reads and writes of less than than 200 are good candidates for sc1. Performance characteristics of volumes are described here.
Volumes with high throughput are good candidates for st1, which has a high sustained baseline MB/s of I/O, with the ability to burst above baseline. To evaluate a transition to sc1 or st1, it is recommended to run volume metric reports for a month, and filter by average I/O levels.
When looking for good candidates to migrate to s1, start by sorting volumes by max read and write MB/H to find your high throughput volumes. Sc1 volumes can sustain 250 mb/s, or 90,000 MB/h. St1 adds headroom to 500 MB/S baseline with a burst capability as described here.
AWS charges for volumes at a GB/hour rate, whether they are attached to an EC2 instance or not, and regardless of whether they were accessed. By monitoring for unattached and idle volumes you can avoid paying for drives which are not necessary.
To handle unattached volumes use a policy similar to this:
To find idle EBS volumes, run volume metrics for the previous month, and add the columns for sum read ops and sum write ops.
Sort by the sum of read or write ops and the idle volumes will sort to the top.
In the data above, the first volume had Write activity, but the other 3 had no reads or writes and are candidates for termination.
Optimizing EBS Volumes can result in significant cost savings. High cost io1 (PIOPS) volumes can often be migrated to gp2. Non-boot gp2 volumes can often be migrated to magnetic storage with 75% cost savings. CloudHealth provides the data you need to optimize your EBS Volumes, and policies to keep unattached volumes under control.
Learn more about how CloudHealth can help you get your EBS volumes back under control. Schedule a demo today.