All popular Cloud providers offer resources which charge you on a “pay per use” model. It’s great to only pay for what you really need. Startups and other companies benefit from having no initial costs to setup whatever they need for their applications. Every month you get a bill for the resources you have created. Problem: you also got a bill for resources you unintentionally forgot and actually don’t use. In this article the message is clear: don’t waste your money on unused cloud resources. But how do you know which resources are unused? We’ll dive into a couple of the biggest ‘hidden’ costs of cloud computing so you can find them within your own organization!
Deleting unused resources is a matter of finding them…
Load balancers act as a “entry point” for an application hosted in the Cloud. Cloud providers have different types of load balancers, each with a different purpose and with a different price. Roughly speaking, the costs are based on the type of load balancer and the amount of data they process.
Load balancers can, for example, be configured to load balance Virtual Machines. But if the VMs attached to this load balancer are deleted, the load balancer will still be there. No one notifies you that you have a “dangling” load balancer, which still costs money. You need to manually delete it.
And this is a reoccurring theme; creating resources is easy. Knowing which ones to delete, is much harder (by design). This requires keeping track of dependencies between different resources, as well as tracking ‘state’ (like ‘in production’, ‘decommisioned’, etc.)
Infra-as-code is the solution?
What about Infrastructure as Code? Can this new way of working prevent unused cloud resources? As often, the answer is “it depends”. Popular tools like Terraform can become “out of sync” which make it harder to control the resources you created earlier.
Cloud Formation Templates (the IaC solution from Amazon) can not always delete all resources. Two examples here. When deleting a Kubernetes cluster, the worker nodes (Virtual Machines) should be deleted first. Some security groups depend on others, which make it harder to delete both of them.
Kubernetes gives developers the possibilities to create load balancers “on the fly” when they deploy an application. However, when the same application is deleted, the load balancer is not always deleted automatically. Developers need to track down the “orphaned” load balancers outside the scope of Kubernetes.
Kubernetes creates load balancers with a “random name” so the names are unique. This makes is harder to detect any load balancer which is not attached to an actual application anymore.
Elastic IP addresses
Many users like to use elastic IP addresses to assign to their Virtual Machines. Those IP addresses are “reserved” for you so you can use, release and reuse them again and again. A pitfall here is to not release (un-associate) the IP address when you don’t need them anymore. Any IP address associated with your account costs money. Those resources are forgotten quite easily since you don’t get a warning when deleting a Virtual Machine that has an elastic IP address attached to it.
Every Virtual Machine in the cloud costs money. Among all cloud resources, these are the ones that the most expensive. Virtual Machines come in different flavors, each of them are optimized for a specific use case. Choosing the right instance type is not easy, and choosing the wrong one can have significant impact on cost. Luckily, there are tools available (like CloudHealth) to help with choosing the right type, as well as optimize the fleet of existing VMs.
It’s also possible to create custom Machine images. Those machine images act like a “template” to launch the actual Virtual Machine. As soon as the Machine image is created, it also created a “snapshot” to be used when you launch the Virtual Machine. Once a Virtual Machine is deleted, the snapshot is still there, waiting for another Virtual Machine to be created from it. If you don’t need it anymore, it’s best to delete it alongside with your virtual machine. Unused snapshots are not cleaned up for you automatically.
All cloud provides offer storage solutions and it also comes in different flavors. Think of Google cloud storage, Azure disks or AWS S3 (different types) and EBS storage volumes.
It’s easy to forget data storage resources since those being used at multiple places (Virtual Machines, virtual disks, backups, etc). Some examples of (orphaned) use cases which make your Cloud bill go up:
- Use the wrong storage type in S3. Files which can be (re)created very easily should not use very highly redundant and reliable storage types. A cheaper option could be good enough. If you use a relatively expensive storage type, make sure you regularly clean up unneeded files and compress files if possible.
- When creating snapshots of Virtual Machines, make sure you use incremental backups if possible to avoid creating (almost duplicate) snapshots of the same Virtual Machine. The Azure cloud does not support this, but the Google cloud and AWS for sure does. Also make sure to delete any snapshots you don’t need anymore.
- In case you use Storage Volumes (EBS in AWS, managed disks in Azure, persistent disks in Google Cloud) be sure to create snapshots of it and delete the original volume/disk from where it came from.
The limitation of dashboards
Many cloud providers provide a nice billing dashboard. The dashboard shows useful information about your current spending. Some also show any predicted future costs based upon the actual resources you have created and the their predicted usage (e.g. data transfer, expected storage needs). Users of the dashboard can set an alarm when costs exceeds a certain threshold. For example: if the monthly costs limit is set to 1000 euros, the dashboard can send a warning when the current costs is already 800 euros.
This feature is nice, but when it comes to prevent unused resources, it’s too late. It would be much more beneficial to receive a warning when resources are not used for a certain period or when they are not used at all. For example: an elastic IP address which is not associated with any Virtual Machine: this one should be flagged and deleted after a number or days (all to be defined by the cloud users). Or a load balancer which does not receive any traffic the last 14 days.
As of now the Cloud user has the full responsibility for all of it. You need to manually create scripts and check for those scenarios. Luckily a regular clean-up process can help to keep your costs under control. And again, this is where 3rd party software, like the aformentioned CloudHealth, can help. Unlike the cloud vendor’s billing portal, independent software like CloudHealth sole goal is to help you spend less on cloud resources.
Do some calculations
As indicated in the previous sections, there is a lot of money to save. Business managers want to know how much to make up their business case.
Check out the following calculation based upon a fictitious scenario:
- A Scrum team of 5 persons.
- The team needs 4 shared environments for their applications: development, test, acceptance, production.
- The cloud provider of choice is AWS.
- Region is EU-WEST-1, prices in euros, ex. VAT
On average, every month the following unused resources are forgotten by this team:
- Five Virtual Machines (EC2 instances) of medium size. Every EC2 instance consists of a EBS volume which is capable of high speed data processing (high IOPS). Also, 5 snapshots are created every month.
- The total number of hours of elastic IPs which are not attached to a EC2 instance are: 24 (hours a day) * 30 days a month * 5 instances: 3600 hours.
- A total of 250 GB of S3 storage is not utilized.
- Five load balancers which are not attached to any Virtual Machine and thus unused.
- 100 GB of “standard storage” and also 100 GB of “Infrequent Access Storage” is not used. This storage has a “provisioned throughput” (fast processing of storage).
Given this (rough) scenario this would cost around € 515 a month for a single team. Every year this is € 6180. Imagine your organization has 20 teams. All unused resources would cost you € 123.000 a year. That is a lot of money! Probably there is more, since this article did only cover a small amount of use cases.
Depending on the salaries of the developers this is about 1 – 2 FTE a year. So what would be your choice: save this money and hire at least 1 extra developer or happily pay the bill to Amazon?
Conclusion & take aways
Cloud providers offer great services to help your organization make use of their cloud services in a very efficient way. But it’s also easy to forget those resources being created, even when using Infrastructure as Code. This article showed how fast your monthly bill can go up. The conclusion is clear: don’t waste your money on unused cloud resources.
Avoid unneeded costs by following these tips:
- Train your developer teams to regularly clean up your unused resources. It’s a mindset that requires some changes in the way teams work to be fully embedded into the workflow.
- When working with IaC, make sure to keep the the IaC scripts and the actual resources in the cloud “in sync” so there will be no “dangling” resources left.
- Create a custom dashboard which detects and shows all unused resources.
- Create a feedback loop between the team dealing with the (detailed) invoice and the teams that create and use the resources to share context and information.
The tips presented here are just a starting point. In reality, these cost reduction processes for unused, underutilized or unoptimized cloud resources requires specialized software to solve the problems adequately.