Best Practices For Optimizing Amazon EBS Volumes

02.14.18
Wilson MacDonald
Senior Professional Services Engineer

CONTACT US

Amazon Elastic Block Store (EBS) is often the second or third highest cost when using AWS at scale. Unattended, EBS costs can become very large over time. You can use some simple rules to best manage your EBS costs, without sacrificing availability and response times.

A volume service such as EBS provides attached storage. There are four metrics that come into play. One obviously is the amount of available storage. A second is the speed, or latency of the disk, sometimes called ‘Seek Time’ - how long it takes the volume to respond to a query. A third is I/O Operations per second, or IOPS. This measures how many requests for I/O per second are being made to the volume. The fourth is throughput. How quickly can large files be read or written to a volume. With EBS Optimization, we focus on matching the optimal EBS volume based on the latency, IOPS, and throughput required.

Pro Tip: By monitoring for unattached and idle volumes you can avoid paying for drives which are not necessary.

EBS is composed of volumes of various sizes and types which provide mounted file storage for AWS Instances:

  • gp2: This is the general purpose volume type. Higher cost SSD storage with low latency, burstable volumes with sustained IOPS up to 10K/s. The baseline IOPS are determined by the size of the volume and have a 99% SLA for sustained delivery of baseline IOPS. Gp2 volumes can burst much higher, I’ve seen bursts above 20K IOPS in gp2.
  • io1: This is the highest cost SSD storage, with low latency volumes and provisioned I/O performance. These volumes have a surcharge for base storage and a surcharge for each provisioned IOP. Io1 volumes have a 99.9% SLA for sustained delivery of the provisioned level of IOPS.
  • st1: These are lower cost magnetic volumes optimized for large amounts of sequential data transfer.
  • sc1: These are the lowest cost magnetic volumes with higher latency, offering up to 200 IOPS performance.
  • Magnetic: The initial standard EBS volume type with high latency, and lower I/O performance. PIOPS (now Io1) volumes were offered as well to provide for higher I/O needs. Prior to the introduction of gp2, PIOPS volumes were the only choice for applications performing high numbers of random I/O operations.

Understanding the costs associated with EBS

You can save a lot of money by matching the volume type to the load that it will receive and the needed speed of delivery. AWS by default provisions a relatively high cost, but very fast volume. Many I/O needs can be met with lower cost magnetic drives. There are large savings to be had moving less critical volumes to magnetic storage. Likewise, expensive io1 GB and PIOPS can often be replaced by high performing gp2 volumes, sized to meet the actual I/O required by the application.

EBS volumes are charged per GB per month. It’s worth noting that io1 is more expensive, both in the cost per GB, and in the cost of adding additional I/O, with a slightly higher SLA for sustained I/O. Gp2 has the the same seek time as io1, and you’ll notice that gp2 is four times the cost of sc1 per GB, so choosing wisely is key!

EBS_IOP.png

How to optimize Amazon EBS Volumes

When optimizing EBS there are a few key questions to consider:

  1. Am I using the PIOPS I have provisioned in io1?
  2. Can I duplicate the PIOPS in gp2 for a slightly lower service guarantee, and a much lower price?
  3. Do all the non-boot gp2 volumes need the low latency and high IOPS of gp2, or can they be well served at much lower cost by sc1?
  4. Do I have high throughput volumes which could benefit from st1?
  5. Do I have idle or unattached volumes?

Assessing PIOPS usage

Over-provisioned PIOPS is a common occurrence, even with our most mature customers. Sometimes it is very obvious, where maximum read or write operations never approach the PIOPS, other times it is not as clear. To get the best measure of the I/O for an io1 volume, use the metric Consumed Read/Write Ops. This is the most conservative metric AWS has for io1 performance. Consumed Read/Write Ops combines reads and writes, and compensates for larger reads and writes. I/O operations that are larger than 256K are counted in 256K capacity units. For example, a 1024K I/O would count as 4 consumed IOPS.

As you can see below, Consumed Read/Write Ops trends higher, and gives you a better idea of the I/O requirements of the volume. (Request for AWS—please make this available for gp2 as well).

EBS_io1.png

Using Consumed Read/Write Ops you can get a good idea of the actual need for I/O, and also how many sustained periods of high I/O a volume encounters. This leads to the second question of what size gp2 volume you should use to deliver the IOPS you need.

Configuring equivalent baseline IOPS in gp2

In most cases it’s not economical to use io1 unless a volume has absolutely critical I/O needs, or has sustained volume above 10K IOPS. Gp2 can deliver the same IOPS for a lot less.

GP2 provisions baseline IOPS at a rate of 3 per GB up to a limit of 10K. To figure out the baseline IOPS for a gp2 volume, simply multiply the GB by 3. For example, 400 GB gp2 will deliver a baseline of 1200 IOPS volume at much lower cost than a 400 GB io1 drive with 1000 PIOPS. The volume will be able to burst to higher levels until it’s ‘Burst Balance’ is exhausted and the performance reverts to baseline, but it will provide the baseline performance for as long as needed.

AWS has made it fairly painless to migrate volumes to different types with the new Elastic Volumes feature described in this blog.

Finding candidates for migrating to magnetic volumes

You can think of sc1 and st1 as workhorses, which are much less expensive than gp2. They perform well in many applications which are not I/O intense, or do not require single millisecond latency. Some rules of thumb that we use: 1) volumes with an average I/O of the larger of reads or writes less than 85, and 2) a max I/O for the larger of reads and writes of less than than 200 are good candidates for sc1. Performance characteristics of volumes are described here.

Volumes with high throughput are good candidates for st1, which has a high sustained baseline MB/s of I/O, with the ability to burst above baseline. To evaluate a transition to sc1 or st1, it is recommended to run volume metric reports for a month, and filter by average I/O levels.

EBS_3.png

EBS_IOPS.png

EBS_Ops.png

Finding candidates to migrate to st1

When looking for good candidates to migrate to s1, start by sorting volumes by max read and write MB/H to find your high throughput volumes. Sc1 volumes can sustain 250 mb/s, or 90,000 MB/h. St1 adds headroom to 500 MB/S baseline with a burst capability as described here.

EBS_MB:h.png

Hygiene — Terminating detached and idle volumes

AWS charges for volumes at a GB/hour rate, whether they are attached to an EC2 instance or not, and regardless of whether they were accessed. By monitoring for unattached and idle volumes you can avoid paying for drives which are not necessary.

To handle unattached volumes use a policy similar to this:

EBSpolicy.png

To find idle EBS volumes, run volume metrics for the previous month, and add the columns for sum read ops and sum write ops.

EBSmetrics.png

Sort by the sum of read or write ops and the idle volumes will sort to the top.

EBS_Ops_MB:h.png

In the data above, the first volume had Write activity, but the other 3 had no reads or writes and are candidates for termination.

Summary

Optimizing EBS Volumes can result in significant cost savings. High cost io1 (PIOPS) volumes can often be migrated to gp2. Non-boot gp2 volumes can often be migrated to magnetic storage with 75% cost savings. CloudHealth provides the data you need to optimize your EBS Volumes, and policies to keep unattached volumes under control.

Learn more about how CloudHealth can help you get your EBS volumes back under control. Schedule a demo today.