Change. It’s the only thing constant in IT. If you manage a dynamic cloud environment, you already know that managing your assets and dealing with continuous changes is extremely time consuming. Something I hear a lot from IT leaders is that the volume and rate of change is beyond what their team can manage. In other words, the complexity is growing beyond human scope.
One thing we can learn from the pioneers of cloud computing is that the only way to manage these dynamic environments is to set up guardrails that allow you to manage by exception. Netflix’s Simian Army is a prime example of this school of thought: bots or “monkeys” roam their environment, monitoring for anomalies, testing for security and resiliency, sometimes wreaking havoc intentionally to make sure systems are ready to respond, and overall taking care of daily operational tasks. This approach allows you to manage your environment in a relatively hands-off manner, allowing staff to focus their time on solving complex problems.
Not everyone is ready to build and deploy Janitor Monkeys, but with some key alerts and policy actions set up within your cloud service management platform, you can drive automation and free up your team’s time so they can spend it innovating.
Let’s break down the types of alerts and policies and get some examples of each.
Financial Management Policies
Financial management policies are all about tracking and understanding your cloud spend, so you can keep costs under control. Set budgets by department, and then set up financial management policies that will alert you when costs unexpectedly spike, or when a group is projected to exceed their budget.
An example of a financial management policy you should be tracking is “If total cost increased by more than 20% in one week, alert me.” This will ensure you don’t end the month with an unpleasant surprise.
Cost Optimization Policies
While financial management policies are critical for keeping pace with budgets and trends, they don’t help you optimize and reduce costs on their own. In AWS, one of the most effective ways to reduce costs is to purchase Reserved Instances. Many organizations will focus their cost optimization policies focus on simplifying and automating the purchase and modification of RIs.
An example of a cost optimization policy you should be using is “If an instance is averaging more than 450 On-Demand hours each month for 3 months, send email alert, potential RI purchase.” This will help you identify potential Reserved Instance purchases, which can save up to 75% compared to On-Demand costs.
Operational Governance Policies
Automating basic operational tasks is one of the best ways to free up your IT staff’s time to let them focus on more strategic and innovative initiatives. This could include automating the detection and elimination of zombies infrastructure, or flagging older instance types, or even scheduling environments to turn off and on again.
An example of an operational governance policy you should be using is “Stop development EC2 instances at 7pm on Friday, start development at 6am on Monday.” The majority of your infrastructure does not need to be running 24/7--the most cost efficient environments dynamically stop and start instances based on a set schedule.
Performance Management Policies
Performance management policies can help you identify over and underutilized infrastructure in your environment. In AWS, some information can be gathered from CloudWatch--it’s important to consider CPU utilization, memory utilization, disk utilization, and network in/out utilization.
An example of a performance management policy you should be using is “If average CPU usage < 20% AND memory usage < 35% AND disk throughput < 35% for over two weeks, then send email notification - potential downsize.” This will find underutilized instances that you can downgraded for cost savings.
Asset & Configuration Management Policies
In the cloud, where virtually any user can provision infrastructure in a few clicks with a credit card, keeping tabs on assets and configuration management is a nightmare. In order to bring asset and configuration management back under control, advanced IT shops realized they needed to manage their environments by exception: set up rules for non-approved configurations and assets and then closely monitor for them.
An example of an asset and configuration management policy you should be using is “If any asset is missing the tag “Environment”, then send email notification.” Because tagging is the central tactic in many asset management strategies, it’s critical that assets are always properly and consistently tagged.
Security and Incident Management Policies
In a rapidly evolving cloud environment, it is important to keep up with changes that might impact your security posture. The best way to do this is with automated security policies, which can monitor for issues and flag them before they become catastrophic. There are many different types of security policies to set across access control, network security, application security, data security, log management, and resiliency.
An example of a security and incident management policy you should be using is “If any EC2 Security Group has an ingress rule of 0.0.0.0/0, then send email notification, modify rules.” This ensures that Security Groups don’t have overly permissive rules that could result in a network security breach.
These are just a few examples of the best practice policies that you should consider to improve operational efficiency in your environment. Learn about these policies, and more, in our new eBook: “Six Essential Types of Policies For Governing your AWS Environment."
If you’ll be at re:Invent 2016, stop by booth #1218 to learn more about how CloudHealth Technologies can help you automate governance policies across multiple cloud environments.