High availability deployments in AWS using Ansible and Packer

In this tutorial we'll take a look at building a machine for a simple app, packaging it up into an AMI for AWS and then deploying it onto an EC2 instance in a way that does not disrupt the application endpoint at all.

You can see all the source code for this tutorial over at Github.

A simple app to deploy

Let's start off with a simple Go app, it's not going to do much but it's going to need a home so that the millions of people that want to access it's content can do so:

# main.go
package main

import (
    "fmt"
    "log"
    "net/http"

    "github.com/gorilla/mux"
)

func main() {
    router := mux.NewRouter().StrictSlash(true)
    router.HandleFunc("/", Index)

    log.Fatal(http.ListenAndServe(":8080", router))
}

func Index(w http.ResponseWriter, r *http.Request) {
    fmt.Fprintf(w, "Hello world")
}

I told you it would be simple! As you can see the app is a web service that listens on port 8080, what we need to do is build a Vagrant machine to house our app so we can build up a provisioning process for eventual deployment.

A Vagrant machine for the app

When we eventually push our app into AWS we are going to use an Ubuntu 14.04 based ami, so let's create a Vagrant machine based on this OS. We need to provision the machine so our app has everything it needs to run, I am going to use Ansible to handle provisioning. When we move onto using Packer to create an ami we will use the same Ansible provisioning playbook to setup the image, this means that we can test both our app and the build process in Vagrant so we shouldn't have too many nasty surprises when deploying to AWS.

Here's the Vagrantfile:

# Vagrantfile
Vagrant.configure(2) do |config|
  config.vm.box = "ubuntu/trusty64"

  config.vm.synced_folder '.', '/tmp/hello-app'

  config.vm.network "forwarded_port", guest: 8080, host: 8010

  config.vm.provision "ansible" do |ansible|
    ansible.playbook = "provision.yml"
  end

  config.vm.provider "virtualbox" do |vb|
    vb.memory = 512
    vb.cpus = 1
  end

end

Nothing too complicated here, I am pushing the contents of the current directory into /tmp so our app code is available during provisioning, let's take a look at provision.yml:

# provision.yml
---

- hosts: all
  tasks:
    - name: Install essential packages
      sudo: yes
      apt:
        update_cache: yes
        name: "{{ item }}"
      with_items:
        - golang
        - vim
        - build-essential
        - git
        - unzip
        - supervisor
        - python-setuptools

    - name: Download AWS cfn-bootstrap
      sudo: yes
      get_url:
        url: https://s3.amazonaws.com/cloudformation-examples/aws-cfn-bootstrap-latest.tar.gz
        dest: /usr/local/src/aws-cfn-bootstrap-latest.tar.gz

    - name: Prepare AWS cfn-bootstrap folder
      sudo: yes
      file:
        name: /usr/local/src/aws-cfn-bootstrap-latest
        state: directory

    - name: Extract AWS cfn-bootstrap
      sudo: yes
      shell: tar xvfz /usr/local/src/aws-cfn-bootstrap-latest.tar.gz --strip-components=1 -C /usr/local/src/aws-cfn-bootstrap-latest
      args:
        creates: /usr/local/src/aws-cfn-bootstrap-latest/setup.py

    - name: Install AWS cfn-bootstrap
      sudo: yes
      shell: easy_install /usr/local/src/aws-cfn-bootstrap-latest
      args:
        creates: /usr/local/bin/cfn-init

    - name: Ensure hello-app user
      sudo: yes
      user:
        name: hello-app
        shell: /bin/bash

    - name: Ensure hello-app directory
      sudo: yes
      file:
        path: /home/hello-app/go/src/github.com/lobsterdore
        state: directory
        recurse: yes
        owner: hello-app
        group: hello-app

    - name: Ensure hello-app log directory
      sudo: yes
      file:
        path: /home/hello-app/logs
        state: directory
        recurse: yes
        owner: hello-app
        group: hello-app

    - name: Copy hello-app from /tmp
      sudo: yes
      shell: "cp -R /tmp/hello-app/ /home/hello-app/go/src/github.com/lobsterdore/; chown -R hello-app:hello-app /home/hello-app/go/src/github.com/lobsterdore/"
      args:
        creates: /home/hello-app/go/src/github.com/lobsterdore/hello-app

    - name: Build hello-app
      sudo: yes
      sudo_user: hello-app
      command: "make build"
      environment:
        GOPATH: /home/hello-app/go
      args:
        chdir: /home/hello-app/go/src/github.com/lobsterdore/hello-app
        creates: /home/hello-app/go/bin/hello-app

    - name: Create supervisor config
      sudo: yes
      shell: "cp /tmp/hello-app/supervisor.conf /etc/supervisor/conf.d/hello-app.conf"
      args:
        creates: /etc/supervisor/conf.d/hello-app.conf
      notify:
        - Reread supervisor
        - Update supervisor

  handlers:
    - name: Reread supervisor
      sudo: yes
      command: supervisorctl reread

    - name: Update supervisor
      sudo: yes
      command: supervisorctl update

As you can see most of this playbook deals with setting up the app and getting it up and running using supervisor, however there are a few tasks that install cfn-bootstrap, which is the AWS CloudFormation helper scripts package. These scripts will be used during deployment to signal that our app is up and running, more on this later on.

Before moving on there are a few files referenced in provision.yml that you haven't seen yet, here is the supervisor config file:

# supervisor.conf
[program:hello-app]
command=/home/hello-app/go/bin/hello-app
directory=/home/hello-app

autostart=true
autorestart=true
startretries=10
startsecs=5

user=hello-app
redirect_stderr=true
stdout_logfile=/home/hello-app/logs/hello-app.log
stdout_logfile_maxbytes=50MB
stdout_logfile_backups=10

And also a Makefile that provides the build command for the app:

# Makefile
deps:
    go get

build: deps
    go build -o ${GOPATH}/bin/hello-app

After running vagrant up the app should be accessible via localhost:8010.

Creating an image

Now that we have a working machine the next step is to create an ami that we can use to spin up an EC2 instance, this image will contain the app and everything needed to run it. Packer makes creating images pretty straight forward so let's use it, here's the template for creating an ami for the app:

# packer.json
{
  "variables": {
    "aws_access_key": "",
    "aws_secret_key": ""
  },
  "builders": [{
    "type": "amazon-ebs",
    "access_key": "{{user `aws_access_key`}}",
    "secret_key": "{{user `aws_secret_key`}}",
    "region": "eu-west-1",
    "source_ami": "ami-47a23a30",
    "instance_type": "t2.micro",
    "ssh_username": "ubuntu",
    "ami_name": "hello-app {{timestamp}}"
  }],
  "provisioners": [
    {
      "type": "file",
      "source": ".",
      "destination": "/tmp/hello-app"
    },
    {
      "type": "shell",
      "inline": [
        "sleep 30",
        "sudo apt-add-repository ppa:rquillo/ansible",
        "sudo /usr/bin/apt-get update",
        "sudo /usr/bin/apt-get -y install ansible"
      ]
    },
    {
      "type": "ansible-local",
      "playbook_file": "provision.yml"
    }
  ]
}

One of the quirks of using Ansible with packer is that playbooks are executed on the builder machine, rather than on your local machine, so Ansible has to be installed on the build box before we can provision. Apart from that the provisioning process runs in a similar fashion to the Vagrant provisioning process. The source_ami here is just a plain Ubuntu 14.04 machine as with our dev box.

Let's add a new Makefile task that will kick off packer:

# Makefile
deps:
    go get

build: deps
    go build -o ${GOPATH}/bin/hello-app

build-ami:
    packer build -var 'aws_access_key=${AWS_ACCESS_KEY}' -var 'aws_secret_key=${AWS_SECRET_KEY}' packer.json

You will need to create some environment variables for AWS_ACCESS_KEY and AWS_SECRET_KEY, add them to your profile before you run 'make build-ami', don't forgot to source your profile to get the new vars in place. When you run this command Packer should eventually give you an ami-id that can be used to build a machine in AWS.

High availability deployments in AWS

Now that we have an image we are ready to push everything into AWS and show off our cutting edge hello world micro service.

Like any service it needs to be highly available, we need to guarantee minimal downtime since people will be looking at it's top notch output 24/7. To facilitate this we are going to create an autoscaling group, this will allow us to specify what happens when the launch config is updated with a new ami. So if you've just added some extra awesomeness to your app (you've made it 5000 times faster for instance) and you''ve built a new ami you can roll out your new image without disrupting your shiney customer facing endpoint.

The default behaviour when launching a new EC2 instance into an autoscale group is to terminate any existing ones, so if you deploy a new ami any old ones will be shut down and removed straight away. This is a problem since instances can take several minutes to start up, especially if they have some kind of bootstrapping process that runs migrations or other time intensive tasks.

So what can be done about this? Well do you remember those cfn-bootstrap scripts that we installed earlier? We can use one of those scripts, namely cfn-signal, to tell the autoscaling group that an instance is up and ready to go once bootstrapping has finished. We can add a call to cfn-signal into the UserData for our apps EC2 instances, in the example below I've added a 1m sleep before calling cfn-signal just to demonstrate existing instances staying in service whilst new ones become ready.

In order to make an autoscaling group wait for signals we have to add a CreationPolicy and an UpdatePolicy to the groups config, if you take a look at the policies in the stack template below you can see that 1 signal is required for a new instance to be considered ready (this is in the ResourceSignal block of the CreationPolicy) and that signals must be received before an update is considered complete (check the WaitOnResourceSignals field of the UpatePolicy).

The Timeout and PauseTime fields denote a grace period within which the required amount of signals must be received, so in the example below if a new instance isn't ready within 10 minutes it will be terminated and any existing instances will stay in service.

Here's the full stack template for our app complete with a vpc and other networking bits:

# cloudformation.json
{
  "AWSTemplateFormatVersion": "2010-09-09",
  "Description": "Hello-app Example",
  "Parameters": {
    "InstanceType": {
      "Description": "WebServer EC2 instance type",
      "Type": "String",
      "Default": "t2.micro",
      "AllowedValues": [ "t1.micro", "t2.micro", "t2.small", "t2.medium", "m1.small", "m1.medium", "m1.large", "m1.xlarge", "m2.xlarge", "m2.2xlarge", "m2.4xlarge", "m3.medium", "m3.large", "m3.xlarge", "m3.2xlarge", "c1.medium", "c1.xlarge", "c3.large", "c3.xlarge", "c3.2xlarge", "c3.4xlarge", "c3.8xlarge", "c4.large", "c4.xlarge", "c4.2xlarge", "c4.4xlarge", "c4.8xlarge", "g2.2xlarge", "r3.large", "r3.xlarge", "r3.2xlarge", "r3.4xlarge", "r3.8xlarge", "i2.xlarge", "i2.2xlarge", "i2.4xlarge", "i2.8xlarge", "d2.xlarge", "d2.2xlarge", "d2.4xlarge", "d2.8xlarge", "hi1.4xlarge", "hs1.8xlarge", "cr1.8xlarge", "cc2.8xlarge", "cg1.4xlarge"]
,
      "ConstraintDescription" : "must be a valid EC2 instance type."
    },
    "KeyName": {
      "Description": "Name of an existing EC2 KeyPair to enable SSH access to the instance",
      "Type": "AWS::EC2::KeyPair::KeyName",
      "ConstraintDescription": "must be the name of an existing EC2 KeyPair."
    },
    "InstanceCount": {
      "Description": "Number of EC2 instances to launch",
      "Type": "Number",
      "Default": "1"
    },
    "InstanceImageId": {
      "Description": "Image ID for EC2 instances",
      "Type": "String"
    }
  },
  "Mappings" : {
  },
  "Resources" : {
    "InternetGateway" : {
      "Type" : "AWS::EC2::InternetGateway",
      "Properties" : {
        "Tags" : [ {"Key" : "Application", "Value" : { "Ref" : "AWS::StackId"} } ]
      }
    },
    "VPC" : {
      "Type" : "AWS::EC2::VPC",
      "Properties" : {
        "CidrBlock" : "10.0.0.0/16",
        "Tags" : [ {"Key" : "Application", "Value" : { "Ref" : "AWS::StackId"} } ]
      }
    },
    "AttachGateway" : {
       "Type" : "AWS::EC2::VPCGatewayAttachment",
       "Properties" : {
         "VpcId" : { "Ref" : "VPC" },
         "InternetGatewayId" : { "Ref" : "InternetGateway" }
       }
    },
    "PublicSubnet" : {
      "Type" : "AWS::EC2::Subnet",
      "Properties" : {
        "VpcId" : { "Ref" : "VPC" },
        "CidrBlock" : "10.0.1.0/24",
        "Tags" : [ {"Key" : "Application", "Value" : { "Ref" : "AWS::StackId"} } ]
      }
    },
    "RouteTable" : {
      "Type" : "AWS::EC2::RouteTable",
      "Properties" : {
        "VpcId" : {"Ref" : "VPC"},
        "Tags" : [ {"Key" : "Application", "Value" : { "Ref" : "AWS::StackId"} } ]
      }
    },
    "Route" : {
      "Type" : "AWS::EC2::Route",
      "DependsOn" : "AttachGateway",
      "Properties" : {
        "RouteTableId" : { "Ref" : "RouteTable" },
        "DestinationCidrBlock" : "0.0.0.0/0",
        "GatewayId" : { "Ref" : "InternetGateway" }
      }
    },
    "PublicSubnetRouteTableAssociation" : {
      "Type" : "AWS::EC2::SubnetRouteTableAssociation",
      "Properties" : {
        "SubnetId" : { "Ref" : "PublicSubnet" },
        "RouteTableId" : { "Ref" : "RouteTable" }
      }
    },
    "PublicSshSecurityGroup": {
      "Type": "AWS::EC2::SecurityGroup",
      "Properties": {
        "GroupDescription": "Enable external SSH access",
        "VpcId": { "Ref": "VPC" },
        "SecurityGroupIngress": [ {
          "IpProtocol": "tcp",
          "FromPort": "22",
          "ToPort": "22",
          "CidrIp": "0.0.0.0/0"
        } ]
      }
    },
    "PublicWebSecurityGroup": {
      "Type": "AWS::EC2::SecurityGroup",
      "Properties": {
        "GroupDescription": "Enable external web access",
        "VpcId": { "Ref": "VPC" },
        "SecurityGroupIngress": [
          {
            "IpProtocol": "tcp",
            "FromPort": "80",
            "ToPort": "8080",
            "CidrIp": "0.0.0.0/0"
          }
        ]
      }
    },
    "WebServerGroup" : {
      "Type" : "AWS::AutoScaling::AutoScalingGroup",
      "Properties" : {
        "AvailabilityZones" : [{ "Fn::GetAtt" : [ "PublicSubnet", "AvailabilityZone" ] }],
        "VPCZoneIdentifier" : [{ "Ref" : "PublicSubnet" }],
        "LaunchConfigurationName" : { "Ref" : "WebLaunchConfig" },
        "MinSize" : "1",
        "MaxSize" : "3",
        "DesiredCapacity" : { "Ref" : "InstanceCount" },
        "LoadBalancerNames" : [ { "Ref" : "WebElasticLoadBalancer" } ],
        "HealthCheckGracePeriod": "600",
        "HealthCheckType": "ELB"
      },
      "CreationPolicy": {
        "ResourceSignal": {
          "Count": "1",
          "Timeout": "PT10M"
        }
      },
      "UpdatePolicy": {
        "AutoScalingRollingUpdate": {
          "MinInstancesInService": "1",
          "MaxBatchSize": "1",
          "PauseTime": "PT10M",
          "WaitOnResourceSignals": "true"
        }
      }
    },
    "WebLaunchConfig": {
      "Type": "AWS::AutoScaling::LaunchConfiguration",
      "Properties": {
        "AssociatePublicIpAddress": "true",
        "ImageId":{
            "Ref":"InstanceImageId"
        },
        "InstanceType":{
           "Ref":"InstanceType"
        },
        "SecurityGroups": [
          { "Ref": "PublicSshSecurityGroup" },
          { "Ref": "PublicWebSecurityGroup" }
        ],
        "KeyName": { "Ref": "KeyName" },
        "UserData":{
          "Fn::Base64":{
            "Fn::Join":[
              "",
              [
                "#!/bin/bash\n",
                "cfn-init ",
                "  --resource WebLaunchConfig ",
                "  --stack ", { "Ref": "AWS::StackName" },
                "  --region ", { "Ref" : "AWS::Region" }, "\n",
                "sleep 1m\n",
                "cfn-signal -e $? ",
                "  --stack ", { "Ref": "AWS::StackName" },
                "  --resource WebServerGroup ",
                "  --region ", { "Ref" : "AWS::Region" }, "\n"
              ]
            ]
          }
        }
      },
      "Metadata" : {
        "Comment" : "Startup hello-app",
        "AWS::CloudFormation::Init" : {
          "config" : {
            "services" : {
              "sysvinit" : {
                "supervisor": {
                  "enabled": "true",
                  "ensureRunning": "true"
                }
              }
            }
          }
        }
      }
    },
    "WebElasticLoadBalancer": {
      "Type": "AWS::ElasticLoadBalancing::LoadBalancer",
      "Properties": {
        "CrossZone": "true",
        "Scheme": "internet-facing",
        "SecurityGroups": [
          { "Ref": "PublicWebSecurityGroup" }
        ],
        "Subnets": [
          { "Ref": "PublicSubnet" }
        ],
        "Listeners": [
          {
            "LoadBalancerPort": "80",
            "InstancePort": "8080",
            "Protocol": "HTTP"
          }
        ],
        "HealthCheck": {
          "Target": "HTTP:8080/",
          "HealthyThreshold": "2",
          "UnhealthyThreshold": "5",
          "Interval": "30",
          "Timeout": "2"
        }
      }
    }
  },
  "Outputs" : {
  }
}

To deploy the stack let's use Ansible's cloudformation module, here is the deployment playbook:

# deploy.yml
---

- hosts: all
  tasks:
    - name: Push hello-app cloudformation stack
      cloudformation:
        aws_access_key: "{{ aws_access_key }}"
        aws_secret_key: "{{ aws_secret_key }}"
        stack_name: "hello-app"
        state: present
        region: eu-west-1
        template: cloudformation.json
        template_parameters:
          KeyName: "{{ aws_key_name }}"
          InstanceImageId: "{{ aws_ami_id }}"
        tags:
          Stack: "hello-app"
          Ami: "{{ aws_ami_id }}"

Let's update the Makefile to include a deployment task:

# Makefile
deps:
    go get

build: deps
    go build -o ${GOPATH}/bin/hello-app

build-ami:
    packer build -var 'aws_access_key=${AWS_ACCESS_KEY}' -var 'aws_secret_key=${AWS_SECRET_KEY}' packer.json

deploy-ami:
    ansible-playbook -i "localhost," -c local -e aws_access_key=${AWS_ACCESS_KEY} -e aws_secret_key=${AWS_SECRET_KEY} -e aws_key_name=${AWS_KEY_NAME} -e aws_ami_id=$(AWS_AMI_ID) deploy.yml

The deploy playbook is run against localhost since we are deploying the stack from our local machine, before running this step run 'make build-ami' which will output an ami-id once it has completed.

You can then deploy the ami like so:

make deploy-ami AWS_AMI_ID={AMI_ID}

As previously mentioned you should add some environment vars to your profile for the AWS_ACCESS_KEY and AWS_SECRET_KEY arguments, you will now need to add AWS_KEY_NAME which should be the name of your SSH key in AWS.

Testing high availability-ness

Let's test what happens when you deploy a new AMI. Each time you run the build-ami command you will get a new ami-id so you can easily test what happens when deploying a new build.

Bring up the whole stack and leave it running, then generate a new ami, deploy this ami and check the AWS Console to test that the old machine is indeed kept in service whilst the new one becomes ready. Before deploying grab the public DNS of the loadbalancer attached to the autoscale group, you can keep hitting this URL whilst the deployment is taking place, it should keep responding with it's output during the deployment.

Next let's see what happens when a deployment fails, this will occur if a machine doesn't reach the cfn-signal part of it's UserData script which would indicate either a failure in it's bootstrap process or that it took too long to become ready. Ensure that the stack is running already, generate a new ami but before deploying remove the cfn-signal part of the UserData script, now deploy the stack. Since we have set the stack to allow 10 minutes for new instances to become ready it will take some time for the deployment to fail, but when it does you should see that the existing instance stays in service and the stack will rollback and end up in a nice green rollback status so it can still be updated in the future.

Self bootstrapping instances

All of this is important because whilst you can bake most of what you need into an image using tools like Ansible and Packer there are still likely to be some environment specific settings that need to be brought in at runtime, you want to avoid creating different images for each of your environments (dev, stage, production etc.) and instead use a boostrap process that is executed on the machine itself when it starts up. The use of high availability deployments is crucial here as the bootstrap process may take several minutes (or fail) and you don't want downtime everytime you deploy.

That's it, now go and try this out for yourself.

You can see all the source code for this tutorial over at Github.