Kubernetes has quickly emerged as one of the leading cloud technologies—and for good reason. Kubernetes makes it possible to deploy modern applications that are scalable, modular, and fault-tolerant. You declare the state you need your environment to be in and it constantly works to maintain that state, which frees up developers from manual tasks around infrastructure management.
With all the benefits of Kubernetes, there are also challenges. When more and more teams start to adopt containers and Kubernetes to develop and deliver their applications, your landscape can quickly become crowded and fragmented.
This is where a Cloud Center of Excellence (CCoE) comes in. The CCoE can bring together key stakeholders across the organization from development, finance, and operations for a unified approach to cloud management within your organization. And when it comes to understanding, optimizing, and reporting on your cloud costs with Kubernetes, we’ve identified five key questions your CCoE should be asking.
1. What are our key KPIs?
Especially as an organization scales, teams can quickly start working in silos and reporting on different things. As a first step to optimizing your Kubernetes cloud financial management, it’s important that everyone is working towards the same KPIs and reporting on the same metrics.
For example, an organization just built new microservices in Kubernetes and the applications team is reporting on the cost of the microservices. This is all well and good, but at some point comes the question—what is the business producing from these cloud services specifically, and how does the cost of the services align to the business’ production?
A CCoE can help identify the most important KPIs to the business, and then align different business units to ensure everyone tracks the same metrics and is utilizing cloud resources for the same bottom line.
2. How do we connect the dots between our application constructs and our KPIs?
Let’s say your organization aligned on a KPI: cost per product line. If you’re lucky, you know exactly which application relates to which product line and this is delineated within your infrastructure already. For example, you’ve set up a Kubernetes cluster per product line, or you have a label across pods that delineates the product line. Often however, this isn’t the case until it’s too late.
Below is an example of how easy it is for two different teams to use labels and namespace in different ways with different interpretations of value. Team A uses namespace by developer, while Team B uses it by microservice. It’s important to standardize on definitions and labels in your Kubernetes cloud infrastructure to ensure consistency and governance. As a best practice, you can set up automated governance checks to ensure this is always the case.
Once you have defined labels and definitions, you can start to think about how to group and allocate costs. With modern apps and Kubernetes however, it isn’t always a straight line. You have to consider thousands to millions of lines of billing data per day, along with services shared across teams. If you have worker nodes (VMs) with multiple pods and each pod serves a different product line, you can’t just group these costs. You need to split the shared ratio of those costs based on which product line used which costs. This leads us to our next question.
3. How do we allocate the costs of our cloud services?
In Kubernetes, you can use resource requests, where developers control the amount of CPU and memory resources per pod or container by setting the resource request field in the configuration file. With resource requests, there are two different ways to think about allocating costs—by the amount that was provisioned, or by the amount that was used. We’ll outline some key considerations for either model below.
Allocating cloud service costs by resource requests
If you decide to allocate cloud service costs by the amount that was provisioned, then the team requesting those resources needs to be able to pay for them, regardless if they used all of them or not. The benefits of this model are that all costs of cloud services will always be allocated and accounted for, and teams will be incentivized to only provision what they will actually use.
A challenge is that not all organizations have visibility or accurate predictability into the resources they’ll need. As a result, teams may under-specify requirements in an effort to save costs, and their app might not have what it needs to run effectively, or at all.
Allocating cloud service costs by usage
On the other hand, you can allocate costs by usage so teams are only paying for what they actually use. Makes sense right? However, it’s rare that teams use 100% of what’s been provisioned, which begs the question—who pays for the idle time and unused resources? Additionally, paying by usage can incentivize teams to provision more than they need and not actually pay for it.
In working closely with our customers, we’ve found organizations opting for the first option— allocating costs by resource requests, which helps to streamline questions around who pays for underutilized resources and encourages teams not to overprovision. Which model you choose depends on the organization, but in either case the principle holds—if your usage is lower than what you request, you’re wasting money. If it’s higher, you risk development or performance issues. That’s why it’s important to have the right tools in place and visibility into your environment.
4. Who pays for common services?
We’ve covered individual cloud services and requests that directly support a single application, but you can’t forget about common services. If you look at a Kubernetes cluster, you have core services like a control pane, log service, or service mesh (which scales as apps scale) that various teams benefit from. Should costs be split evenly across all product lines? Or should they be split proportionally based on usage? Or is there a central cost center that covers these platform costs? Different teams will have different answers, but it’s important to consider these questions for consistency and clarity in your cloud financial management practice.
5. What are my optimization opportunities?
Now for the question everyone’s been waiting for—how do we optimize our cloud costs? The good news is that Kubernetes is flexible. The bad news is that Kubernetes is flexible, so there are multiple optimization components to consider. There are four primary ways to optimize your Kubernetes cloud costs:
- Pod rightsizing (what we mentioned earlier about resource requests)
- Node rightsizing (VM rightsizing)
- Autoscaling (horizontal pod autoscaling, cluster autoscaling)
- Leveraging cloud discounts (reserved instances, spot, savings plans)
In part two of this blog series, we’ll go into more detail about each of these four methods, but to learn more now, we encourage you to watch the recording of our session during the Angelbeat Virtual Seminar focused on Kubernetes, cloud-native design, app modernization, and migration.
For more best practices on managing cloud costs with Kubernetes, see our in-depth whitepaper: FinOps for Kubernetes: Unpacking Container Cost Allocation and Optimization