Having explained what cloud governance is, the difference between governing an on-premises environment and a cloud environment, and how organizations can take back control of a decentralized cloud environment, we provide sample cloud governance policies and discuss how best to enforce them.
In part 1 of our guide to cloud governance, we explained why cloud governance is important and how organizations experiencing cost, performance, and/or security issues can take steps to address the issues by creating a Cloud Center of Excellence with the task of establishing guardrails. We also explained the process wasn’t easy because there is no “one-size-fits-all” template for cloud governance.
In this article, we provide sample governance policies that can help maintain control of cost, performance, and security; and we have also included information about how it is possible to enforce policies seamlessly by taking advantage of policy-driven automation.
Example cloud cost governance policies
Many organizations look at cost governance policies as a means of controlling cost, but that is not their only purpose. Although policies can limit how much is spent by a department, team, project, etc. per month—after which they can be locked out of their accounts—a more appropriate use of cost governance policies is to identify cost drivers and then address the cause of the issue.
Cost governance policies can also be applied to reduce costs by identifying unused or underused resources, and opportunities to take advantage of committed use discounts (i.e. AWS Reserved Instances and Savings Plans). When a cloud tagging strategy is applied, it is also possible to use governance policies for showback or chargeback in order to increase transparency and accountability around costs.
All cloud governance solutions can be configured to send notifications to budget owners and/or system administrators when events occur that trigger a policy alert. Other types of events that might trigger a policy alert include:
- When costs per [tag/user/department/team/etc.] increase by more than a predefined percentage over a [day/week/month/etc.]. This type of cost trend policy can be particularly useful for identifying misconfigurations or malicious insiders.
- When the CPU usage, memory usage, and disk throughput of a Virtual Machine falls below 50% of the Virtual Machine’s capacity over a user-defined period of time. This indicates the Virtual Machine is over-provisioned for the current level of demand.
- When a block storage volume, IP address, or other resource has been unattached for a user-defined period of time. Until these resources are terminated, the organization is still paying for them, so this type of policy can save a considerable amount in costs.
- When a Virtual Machine is running on Pay-as-You-Go pricing for more than 500 hours in a month or averaging more than 400 hours per month over a three-month period. This type of policy identifies possibilities to take advantage of committed use discounts.
Example cloud performance governance policies
It is important to remember when developing performance governance policies that the role of the Cloud Center of Excellence is to govern rather than control. Therefore, the team tasked with developing the cloud governance framework should clearly define which operations are performed in the cloud and which on-premises, and aim to ensure resource consistency between the two environments.
One of the benefits of creating a collaborative Cloud Center of Excellence from multiple departments is that it eliminates duplication and incompatibilities—i.e. two departments using different resources to do the same job. However, for this benefit to realize its true potential, the resource put in place to serve both departments should be available and running smoothly at all times.
Therefore, in the same way that resources need to be monitored for over-provisioning to save costs, they also need to be monitored for under-provisioning to avoid over-utilization. When over-utilization occurs and services are unavailable, there is a stronger likelihood of teams deploying unsanctioned resources—so there need to be guardrails in place to prevent this happening without authorization.
- System administrators should set up cloud monitoring tools to alert them when (for example) CPU utilization or memory utilization or disk throughput or network throughput exceeds 80% for more than a week to upgrade the resource for more capacity.
- System administrators should also set up monitoring tools to identify the deployment of any resource not within the organization's performance governance policies. This could be an indication of Line of Business IT or something more sinister.
- With regards to sinister events, system administrators should also create policies for allowable development configurations. These policies should be applied to the staging and testing phase of new deployments in order to prevent exploitable vulnerabilities.
- It is also a good idea to create policies for ensuring the continued compliance of resources once deployed. Tools such as Amazon Inspector can be used to prevent “configuration drift” due to changes being made to a resource after deployment.
Example cloud security governance policies
Our example cost governance policies and performance governance policies have already hinted that cost, performance, and security are intertwined; and it is certainly the case that cloud security should not be siloed from cloud governance planning as it often is on-premises. Indeed, some security governance policies can contribute to saving costs in the cloud and enhancing performance.
However, whereas the cost and performance governance policies are primarily concerned with what resources can be used and how they can be used, security governance policies are more focused on who can use resources and access the data they store. Typically, security governance policies address access control, network security, application security, data security, log management, and resiliency.
Although this may seem like many balls to juggle simultaneously, there are several shortcuts that can be taken to reduce the management overhead. For example, organizations can apply access control policies to resource groups in order to standardize access to resources and simplify the monitoring process. Other example security governance policies include:
- Enable multi-factor authentication on as many accounts as is practically possible, with a strong password policy applied to all other accounts. Passwords should be rotated or changed frequently to avoid unauthorized access via compromised credentials.
- Specify a range of IP addresses from which users can log into accounts and block attempts to log in from other locations. It may be necessary to whitelist specific IP addresses for personnel working remotely, but this can be done according to specific circumstances (i.e. temporarily).
- Encryption is generally recommended for data at rest and in transit, but it can negatively affect performance. If the organization has performance concerns, policies should be developed that determine which data should be encrypted.
- Security governance policies should also be developed to support business continuity in the event of an outage. These will depend on the nature of an organization´s recovery time objectives, recovery point objectives, and availability SLAs.
Enforcing cloud governance policies with automation
Because of the speed at which resources can be deployed in the cloud, it is impossible to manually monitor compliance with cloud governance policies. Multiple cloud-native tools exist to automate the monitoring process, but these are not necessarily ideal in every circumstance. For example, it is not possible to customize cloud governance policies with Amazon Inspector.
It is also the case that, if a policy is violated, a notification may be of little help—for example, if multi-factor authentication has been disabled on a root account, the log-in credentials for the account are compromised, and a malicious actor has access to the network. Consequently, it is better to enforce cloud governance policies with a solution that can prevent policy violations.
Cloud management platforms with policy-driven automation capabilities are ideal in this respect. These platforms can enforce cloud governance policies in whichever way the organization believes is most appropriate for the event. Using some of the above cost, performance, and security governance policies as examples, system administrators have the option to:
- Automatically terminate unused block storage volumes, IP addresses, or other resources after a user-defined period of time.
- Initiate an approval workflow when an attempt is made to deploy a resource not sanctioned by a cloud governance policy.
- Automatically enable multi-factor authentication on high-privilege accounts, and encrypt publicly-accessible storage volumes.
Other examples might include automatically migrating infrequently-accessed data to lower cost storage tiers, preventing users from logging into accounts outside office hours, and blocking the deployment of sanctioned resources in unsanctioned regions (or initiating an approval workflow). Usually the Cloud Center of Excellence will determine which option is most appropriate for each type of policy violation.
Integrating cloud governance with business objectives
The ultimate goal of the cloud governance operating model is to reach a point where there is unity between cloud strategy, business systems, and the organization’s business objectives. The integration of cloud governance with business objectives should help an organization achieve operational excellence, business competitiveness, cost/delivery effectiveness, and innovation enablement.
Building on the automated enforcement of cloud governance policies to reach this point will be discussed in part 3, along with the obstacles some organizations may encounter. As mentioned in part 1, taking back control of a decentralized cloud environment is not going to happen overnight, and it remains important the Cloud Center of Excellence continues to benchmark performance to demonstrate improvements.