In a previous post, Raging with the Machine, I talked about the benefits organizations gain through automating processes: doing things better, doing more things and containing costs through continual optimization. In this post, I dig deeper into ‘the doing things better’ part, and specifically how IaaS automation can be designed to systematically uphold security measures.
IT EXPERTISE IS CRUCIAL
When a provider puts a service into operation, be it a car-rental facility, mobile-phone repair, or a digital streaming capability, the processes to build and provision the hardware to support the service, and the security measures needed to protect the service, are crucial elements of the overall solution. Building and maintaining the underlying architecture for consumer-facing services requires investment in terms of time, IT expertise and security knowledge. For many providers, Infrastructure-as-a-Service (IaaS) is an ideal hosting solution, as it relieves many architecture headaches; such as aging hardware, the cost of idle machinery and the lack of storage and processing flexibility to cope with peaks in demand.
In my opinion, the challenge IaaS presents is not running a service, but developing it. By providing the means to spin up diverse types of machines, IaaS enables concepts and code to be tested in a way that isn’t economically feasible with on-premise solutions – which in turn enables the organization to develop more interesting, revenue-generating services.
For live services, reliability, availability, performance, security and user experience are crucial success factors. In contrast, during development, the need for infrastructure is temporary and success is measured in the ability to deliver and test an outcome rapidly.
To ensure that devices are correctly provisioned, protecting from the cost and security risks associated with zombie VMs and incorrectly provisioned devices, most organizations entrust the responsibility for IaaS to their IT department, or to a couple of expert individuals. Unfortunately, this approach tends to be costly and inflexible. It is a sure-fire way of promoting shadow IT and one of the root causes of the ever-widening Disruption Gap. Service desks tend to be busy ensuring operations and often lack the bandwidth to deliver specialized infrastructure services.
What’s needed is a secure way to enable development teams to spin up and decommission VMs as and when they need them. But this is not as simple as it might seem. Creating a virtual machine, even via the portal of a top cloud service provider, is a multi-step complex process that requires expertise and detailed information to complete correctly.
For example, the process to spin up a virtual machine using the official AWS guide to Launch a Windows Virtual Machine is described in five steps. Counting the sub-steps, there are 16 decision-making points in total – each one with the potential to expose your system if you lack access to the right information.
Figure 1, illustrates the process to create a VM. I’d like to stress that the complexity is in no way specific to Amazon Web Services (AWS), or any other cloud services provider, building an IT infrastructure requires expertise.
Figure 1: Walking through the steps to create a virtual machine
So, let’s take a closer look at some of those decisions. For me, one of most annoying aspects of this type of decision-making is that there are no rules, answers are often empirical. Which communication protocol, for example, should you choose for your VM? TCP or UDP? Well, it depends. For example, if you are testing a streaming service and you are not concerned with losing a bit of data here and there, but you are concerned with latency, you’ll probably want to go with UDP. However, if the purpose of the virtual machine is to gather user input, then the more data-reliable TCP protocol might be the best option. But, if you are just testing a piece of software, it probably doesn’t matter what communication protocol you choose.
Moving on. What instance type should you choose? Again, it depends. But for this decision alone, there are almost 70 options to choose from.
And whatever instance type you choose in turn impacts your choice for storage. What type of storage does your virtual machine require and what can the instance type support? Again, it depends on what you are trying to do, SSD is probably the right choice if your application is input/output heavy, or you may need file-system type storage if you want to share information among virtual machines.
And the process continues until you’ve completed all the decision points.
Snow Automation Platform together with the Automation Books for AWS, Azure, and Google Compute Engine (GCE) provide a systematic approach to program the information required at each decision point into a workflow. Workflows can be made available directly to the business teams that need access to infrastructure, and multiple workflows can be configured to cater for different scenarios, such as: basic test, high-performance demos, training and large storage.
But there’s more to be gained from such a systematic approach than freeing up IT resources, ensuring security protocols are upheld and empowering users.
To spin up a machine in a cloud environment, users need some sort of admin credentials. When VMs are instead spun up by a machine rather than an individual, the need to provide individuals with superuser credentials disappears – this information is handled programmatically. An extra check for improved security.
To ensure that the right people in the organization have access to the infrastructure service, role-based control can be implemented, with business users (rather than IT) approving access for newcomers. Role-based access has the double effect of enabling users while ensuring that unauthorized personnel are kept out. An additional check for security, plus one for flexibility and one for cost control.
With a systematic approach to spinning up VMs, automatic logging plays a key role in security and cost management. Automatic logging can track who, when, and for how long resources are being created. Logging may not sound that innovative, but without it there is no means to control your environment.
Imagine, for example, a large VM is running in your estate, costing you a lot of money. Without a log you have no idea who is responsible for it. If the VM is incorrectly configured according to your security protocol, it needs to be updated without impacting business operations. Automatic logging provides instant access to the information needed to take the next step. An extra check for security and one for cost control.
For demo/ event scenarios, VMs may need to be spun up with specialized images. Again, a template can be created once and used by multiple sales personnel who need to focus on demonstrating product capability and not the technical aspects of how to manage VM deployment. An extra check for security, one for compliance, and one for revenue generation.
Sometimes, customers will allow developers to use their environment data to recreate a bug or test performance issues. It is vital to protect such customer data with appropriate security measures – not least for relationship management but also for GDPR compliance. A secure, customer-data template VM could be created to cover this use case. An extra check for security and one for compliance.
If you are worried about an attack on your system, revealing programmatically-entered access codes; each workflow can be created with a different system account, so that even if one part of the system is attacked, exposure to your entire cloud infrastructure will not happen.