How to perform high availability deployments of stateful applications in AWS - Zookeeper edition

Deploying stateless applications in AWS is pretty straight forward using AMIs and ASGs, however stateful applications are much trickier to deploy in an automated fashion with most people resorting to hard coded instances that are updated in place. In this article we will take a look at a way to use ASGs and AMIs to achieve high availability rolling deployments with stateful applications, using Zookeeper as an example.

In previous articles such as High availability deployments in AWS using Ansible and Packer and How to use the will replace feature of AWS Auto Scaling Groups I have described deployment techniques for stateless applications, using these techniques you can easily incorporate any such application into an automated release process with confidence and have developers pushing code to their hearts content. Ideally we would like to achieve this level of automation with all of the applications that we manage, stateful applications such as ElasticSearch, Kafka, Zookeeper and MySQL have always presented a challenge since you need to maintain a cluster of instances that know about each other and you need to maintain data between deployments. Many people simply hard code EC2 instances and update them in place, this is difficult to automate and can lead to "push the button and hold your breath" style releases, not for the faint of heart! By leveraging a few out of the box AWS services we can make it possible to perform rolling AMI based updates which will allow us to pop those pesky stateful services into a release automation process.

In this article we are going to use Cloudformation for orchestraion and Zookeeper as the stateful application, you can apply the same principles to other orchestration tools (eg. Terraform and Ansible) and stateful applications (Kafka etc.).

Components of a stateful deployment

Generally speaking, stateful applications, unlike stateless applications, need to maintain cluster members and cluster data at all times; if we are going to take the approach of using AMIs to bring up new instances to apply updates then we need to maintain these extra pieces of the puzzle as instances come and go. Luckily there are two handy AWS services that we can employ to make this possible; Elastic Network Interfaces (ENIs) can be used to maintain the cluster member list and Elastic Block Storage (EBS) volumes can be employed to maintain cluster data.

Elastic Network Interfaces allow us to maintain the same private or public IP when we replace instances, in the list of cluster members for an application we have the option of using this constant list of IPs to assign members or alternatively assign Route53 entries to each ENIs IP and use DNS instead, either option is fine. When an instance boots it will attach a predefined ENI with provides a persistent IP. The ENIs are defined separately to the ASGs that control instances so they persist between updates and should remain unless you delete the applications CF stack, you can of course define the ENIs in a separate CF stack so they persist even if the application stack is deleted.

In a similar fashion, Elastic Block Storage volumes maintain data during replacement, an instance attaches a volume during boot and mounts it at the applications data folder. As an added bonus you can snapshot EBS volumes at regular intervals allowing for backups to be taken of applications that do not offer a native solution.

Before diving into an example of all of this in action it's worth examining how Auto Scaling Groups are going to be employed. For stateless applications a single ASG is used to control all instances, however for stateful applications we are going to use an ASG per instance. Using multiple ASGs gives us a few advantages, we can specify exactly which ENI and EBS volume we want an instance to use by adding their IDs as tags to the ASG, we can also assign an instance index via tags, many services such as Zookeeper and Kafka require an instance ID of some kind so we can use this index for these applications. Mutiple ASGs also gives us flexibility over how many instances we can have, ENIs and EBS volumes are Availability Zone specific so if you are using all AZs (which you should be) then instances can be launched in any one of them, this means that in a single ASG setup you would need to have ENI and EBS volume for each AZ that the ASG spans hence you would need to launch instances in multiples of the number of AZs (eg. 3, 6, 9) which is not ideal at all, especially for applications that form quorums. When using multiple AZs you can specify an exact AZ for the ASG, ENI and EBS volume, allowing for quorum friendly numbers, using a templating language of some kind allows us to cycle through each AZ until you have the desired number of ASGs, in the example in this article we are going to add each ASG statically however.

The stateful deployment in action

Let's take a look at a Cloudformation template that uses the process defined above for a Zookeeper cluster:

---
# ha_stateful.zookeeper.template.yaml

AWSTemplateFormatVersion: '2010-09-09'
Description: HA Stateful Zookeeper Example

Parameters:
  InstanceType:
    Description: Stateful Zookeeper EC2 instance type
    Type: String
    Default: t2.nano
    AllowedValues:
    - t2.nano
    - t2.micro
    ConstraintDescription: must be a valid EC2 instance type.
  KeyName:
    Description: Name of an existing EC2 KeyPair to enable SSH access to the instance
    Type: AWS::EC2::KeyPair::KeyName
    ConstraintDescription: must be the name of an existing EC2 KeyPair.
  InstanceImageId:
    Description: Image ID for EC2 instances
    Type: String

Resources:
  InternetGateway:
    Type: AWS::EC2::InternetGateway
    Properties:
      Tags:
        - Key: Stack
          Value: !Ref AWS::StackId

  VPC:
    Type: AWS::EC2::VPC
    Properties:
      CidrBlock: 10.0.0.0/22
      EnableDnsSupport: 'true'
      EnableDnsHostnames: 'true'
      Tags:
        - Key: Stack
          Value: !Ref AWS::StackId

  AttachGateway:
    Type: AWS::EC2::VPCGatewayAttachment
    Properties:
      VpcId: !Ref VPC
      InternetGatewayId: !Ref InternetGateway

  PublicSubnetA:
    Type: AWS::EC2::Subnet
    Properties:
      AvailabilityZone: !Sub ${AWS::Region}a
      CidrBlock: 10.0.0.0/24
      MapPublicIpOnLaunch: 'false'
      VpcId: !Ref VPC
      Tags:
        - Key: Stack
          Value: !Ref AWS::StackId

  PublicSubnetB:
    Type: AWS::EC2::Subnet
    Properties:
      AvailabilityZone: !Sub ${AWS::Region}b
      CidrBlock: 10.0.1.0/24
      MapPublicIpOnLaunch: 'false'
      VpcId: !Ref VPC
      Tags:
        - Key: Stack
          Value: !Ref AWS::StackId

  PublicSubnetC:
    Type: AWS::EC2::Subnet
    Properties:
      AvailabilityZone: !Sub ${AWS::Region}c
      CidrBlock: 10.0.2.0/24
      MapPublicIpOnLaunch: 'false'
      VpcId: !Ref VPC
      Tags:
        - Key: Stack
          Value: !Ref AWS::StackId

  RouteTable:
    Type: AWS::EC2::RouteTable
    Properties:
      VpcId: !Ref VPC
      Tags:
        - Key: Stack
          Value: !Ref AWS::StackId

  Route:
    Type: AWS::EC2::Route
    DependsOn: AttachGateway
    Properties:
      RouteTableId: !Ref RouteTable
      DestinationCidrBlock: 0.0.0.0/0
      GatewayId: !Ref InternetGateway

  PublicSubnetARouteTableAssociation:
    Type: AWS::EC2::SubnetRouteTableAssociation
    Properties:
      SubnetId: !Ref PublicSubnetA
      RouteTableId: !Ref RouteTable

  PublicSubnetBRouteTableAssociation:
    Type: AWS::EC2::SubnetRouteTableAssociation
    Properties:
      SubnetId: !Ref PublicSubnetB
      RouteTableId: !Ref RouteTable

  PublicSubnetCRouteTableAssociation:
    Type: AWS::EC2::SubnetRouteTableAssociation
    Properties:
      SubnetId: !Ref PublicSubnetC
      RouteTableId: !Ref RouteTable

  HostedZonePrivate:
    Type: AWS::Route53::HostedZone
    Properties:
      Name: internal
      VPCs:
        - VPCId: !Ref VPC
          VPCRegion: !Sub ${AWS::Region}

  PublicSshSecurityGroup:
    Type: AWS::EC2::SecurityGroup
    Properties:
      GroupDescription: Enable external SSH access
      VpcId: !Ref VPC
      SecurityGroupIngress:
        - IpProtocol: tcp
          FromPort: '22'
          ToPort: '22'
          CidrIp: 0.0.0.0/0

  StatefulZookeeperSecurityGroup:
    Type: AWS::EC2::SecurityGroup
    Properties:
      GroupDescription: Enable external access
      VpcId: !Ref VPC
      SecurityGroupIngress:
        - IpProtocol: tcp
          FromPort: '2181'
          ToPort: '2181'
          CidrIp: 10.0.0.0/22
        - IpProtocol: tcp
          FromPort: '2888'
          ToPort: '2888'
          CidrIp: 10.0.0.0/22
        - IpProtocol: tcp
          FromPort: '3888'
          ToPort: '3888'
          CidrIp: 10.0.0.0/22

  StatefulZookeeperElbSecurityGroup:
    Type: AWS::EC2::SecurityGroup
    Properties:
      GroupDescription: Enable external ELB access
      VpcId: !Ref VPC
      SecurityGroupIngress:
        - IpProtocol: tcp
          FromPort: '2181'
          ToPort: '2181'
          CidrIp: 10.0.0.0/22

  StatefulZookeeperRole:
    Type: AWS::IAM::Role
    Properties:
      AssumeRolePolicyDocument:
        Version: '2012-10-17'
        Statement:
        - Effect: Allow
          Principal:
            Service:
            - ec2.amazonaws.com
          Action:
          - sts:AssumeRole
      Path: "/"
      Policies:
      - PolicyName: StatefulZookeeperPolicy
        PolicyDocument:
          Version: '2012-10-17'
          Statement:
            - Action:
                - ec2:AttachNetworkInterface
                - ec2:AttachVolume
                - ec2:DescribeInstances
                - ec2:DescribeNetworkInterfaces
                - ec2:DescribeTags
                - ec2:DescribeVolumes
              Resource: "*"
              Effect: Allow

  StatefulZookeeperInstanceProfile:
    Type: AWS::IAM::InstanceProfile
    Properties:
      Path: "/"
      Roles:
      - Ref: StatefulZookeeperRole

  StatefulZookeeperLoadBalancer:
    Type: AWS::ElasticLoadBalancing::LoadBalancer
    Properties:
      CrossZone: 'true'
      Scheme: internal
      SecurityGroups:
        - !Ref StatefulZookeeperElbSecurityGroup
      Subnets:
        - !Ref PublicSubnetA
        - !Ref PublicSubnetB
        - !Ref PublicSubnetC
      Listeners:
        - LoadBalancerPort: '2181'
          InstancePort: '2181'
          Protocol: TCP

  StatefulZookeeperLoadBalancerDns:
    Type: AWS::Route53::RecordSet
    Properties:
      AliasTarget:
        DNSName:  !GetAtt StatefulZookeeperLoadBalancer.DNSName
        HostedZoneId:  !GetAtt StatefulZookeeperLoadBalancer.CanonicalHostedZoneNameID
      HostedZoneId:  !Ref HostedZonePrivate
      Name: zookeeper.internal
      Type: A

  StatefulZookeeperLaunchConfig:
    Type: AWS::AutoScaling::LaunchConfiguration
    DependsOn:
      - StatefulZookeeperEniOne
      - StatefulZookeeperEniTwo
      - StatefulZookeeperEniThree
    Metadata:
      AWS::CloudFormation::Init:
        config:
          files:
            /etc/zookeeper/conf/environment:
              content: |
                NAME=zookeeper
                ZOOCFGDIR=/etc/$NAME/conf
                CLASSPATH="$ZOOCFGDIR:/usr/share/java/jline.jar:/usr/share/java/log4j-1.2.jar:/usr/share/java/xercesImpl.jar:/usr/share/java/xmlParserAPIs.jar:/usr/share/java/netty.jar:/usr/share/java/slf4j-api.jar:/usr/share/java/slf4j-log4j12.jar:/usr/share/java/zookeeper.jar"
                ZOOCFG="$ZOOCFGDIR/zoo.cfg"
                ZOO_LOG_DIR=/var/log/$NAME
                USER=$NAME
                GROUP=$NAME
                PIDDIR=/var/run/$NAME
                PIDFILE=$PIDDIR/$NAME.pid
                SCRIPTNAME=/etc/init.d/$NAME
                JAVA=/usr/bin/java
                ZOOMAIN="org.apache.zookeeper.server.quorum.QuorumPeerMain"
                ZOO_LOG4J_PROP="INFO,ROLLINGFILE"
                JMXLOCALONLY=false
                JAVA_OPTS="-Xmx128m -Xms128m"
              mode: "000644"
              owner: "root"
              group: "root"
            /etc/zookeeper/conf/zoo.cfg:
              content: !Sub |
                tickTime=1000
                initLimit=1200
                syncLimit=300
                dataDir=/var/lib/zookeeper
                clientPort=2181
                server.1=${StatefulZookeeperEniOne.PrimaryPrivateIpAddress}:2888:3888
                server.2=${StatefulZookeeperEniTwo.PrimaryPrivateIpAddress}:2888:3888
                server.3=${StatefulZookeeperEniThree.PrimaryPrivateIpAddress}:2888:3888
                leaderServes=yes
              mode: "000644"
              owner: "root"
              group: "root"
    Properties:
      AssociatePublicIpAddress: 'true'
      IamInstanceProfile: !Ref StatefulZookeeperInstanceProfile
      ImageId: !Ref InstanceImageId
      InstanceType: !Ref InstanceType
      SecurityGroups:
        - !Ref PublicSshSecurityGroup
        - !Ref StatefulZookeeperSecurityGroup
      KeyName: !Ref KeyName
      UserData:
        Fn::Base64: !Sub |
          #!/bin/bash
          set -euo pipefail
          addgroup \
            -gid 5000 \
            zookeeper
          adduser \
            --gid 5000 \
            --uid 5000 \
            --no-create-home \
            --disabled-password \
            --disabled-login \
            --system \
            zookeeper
          apt-get update
          apt-get install -y zookeeper zookeeperd python2.7 python-pip curl jq ntp
          service zookeeper stop
          pip install awscli https://s3.amazonaws.com/cloudformation-examples/aws-cfn-bootstrap-latest.tar.gz
          cfn-init \
            --resource StatefulZookeeperLaunchConfig \
            --stack ${AWS::StackName} \
            --region ${AWS::Region}
          INSTANCE_ID=$( curl -s http://169.254.169.254/latest/meta-data/instance-id )
          INSTANCE_TAGS=$( aws ec2 describe-tags \
            --filters "Name=resource-id,Values=$INSTANCE_ID" \
            --region ${AWS::Region} \
            --query "{ instanceeniid:Tags[?Key=='instanceeniid'].Value, instanceindex:Tags[?Key=='instanceindex'].Value, instancevolume:Tags[?Key=='instancevolume'].Value }" )
          INSTANCE_ENI_ID=$( echo "$INSTANCE_TAGS" | jq -r '.instanceeniid | .[0]' )
          aws ec2 attach-network-interface \
            --network-interface-id $INSTANCE_ENI_ID \
            --instance-id $INSTANCE_ID \
            --region ${AWS::Region} \
            --device-index 1
          INSTANCE_ENI_IP=$( aws ec2 describe-network-interfaces \
            --network-interface-ids $INSTANCE_ENI_ID \
            --region ${AWS::Region} \
            --query "NetworkInterfaces[0].PrivateIpAddress" \
            --output text )
          GATEWAY_IP=$( /sbin/ip route | awk '/default/ { print $3 }' )
          echo -e "auto eth1\niface eth1 inet dhcp\n  post-up ip route add default via $GATEWAY_IP dev eth1 tab 2\n  post-up ip rule add from $INSTANCE_ENI_IP/32 tab 2 priority 600" \
            > /etc/network/interfaces.d/eth1.cfg
          sleep 15s
          service networking restart
          INSTANCE_VOLUME_ID=$( echo "$INSTANCE_TAGS" | jq -r '.instancevolume | .[0]' )
          aws ec2 attach-volume \
            --volume-id $INSTANCE_VOLUME_ID \
            --instance-id $INSTANCE_ID \
            --region ${AWS::Region} \
            --device /dev/xvdb
          sleep 15s
          set +e
          blkid -L zookeeperdata
          FILESYSTEM_EXISTS=$?
          set -e
          if [[ $FILESYSTEM_EXISTS -ne 0 ]]; then
            mkfs -t ext4 -L zookeeperdata /dev/xvdb
          fi
          mkdir -p /var/lib/zookeeper
          mount /dev/xvdb /var/lib/zookeeper
          echo "LABEL=zookeeperdata /var/lib/zookeeper ext4 nofail 0 0" >> /etc/fstab
          chown -R zookeeper:zookeeper /var/lib/zookeeper
          INSTANCE_ASG=$( aws ec2 describe-instances \
            --instance-id $INSTANCE_ID \
            --region ${AWS::Region} \
            --query "Reservations[0].Instances[0].Tags[?Key=='aws:cloudformation:logical-id'].Value" \
            --output text )
          echo "$INSTANCE_TAGS" \
            | jq -r '.instanceindex | .[0]' \
            > /var/lib/zookeeper/myid
          service zookeeper start
          set +e
          ps auxw | grep -P '\b'zookeeper'(?!-)\b'
          ZOOKEEPER_RUNNING=$?
          set -e
          cfn-signal \
            -e $ZOOKEEPER_RUNNING \
            --stack ${AWS::StackName} \
            --resource $INSTANCE_ASG \
            --region ${AWS::Region}

  StatefulZookeeperEniOne:
    Type: "AWS::EC2::NetworkInterface"
    Properties:
      Description: StatefulZookeeperGroupOne ENI
      GroupSet:
        - !Ref PublicSshSecurityGroup
        - !Ref StatefulZookeeperSecurityGroup
      SubnetId: !Ref PublicSubnetA

  StatefulZookeeperVolumeOne:
    Type: "AWS::EC2::Volume"
    Properties:
      AvailabilityZone: !Sub ${AWS::Region}a
      Size: 8
      VolumeType: gp2

  StatefulZookeeperGroupOne:
    Type: AWS::AutoScaling::AutoScalingGroup
    DependsOn:
      - StatefulZookeeperEniOne
      - StatefulZookeeperVolumeOne
    Properties:
      VPCZoneIdentifier:
        - !Ref PublicSubnetA
      LaunchConfigurationName: !Ref  StatefulZookeeperLaunchConfig
      LoadBalancerNames:
        - !Ref StatefulZookeeperLoadBalancer
      DesiredCapacity: 1
      MinSize: 0
      MaxSize: 1
      HealthCheckGracePeriod: '300'
      HealthCheckType: EC2
      Tags:
        - Key: instanceeniid
          Value: !Ref StatefulZookeeperEniOne
          PropagateAtLaunch: 'true'
        - Key: instanceindex
          Value: '1'
          PropagateAtLaunch: 'true'
        - Key: instancevolume
          Value: !Ref StatefulZookeeperVolumeOne
          PropagateAtLaunch: 'true'
        - Key: Name
          Value: 'zookeeper-one'
          PropagateAtLaunch: 'true'
    CreationPolicy:
      ResourceSignal:
        Count: 1
        Timeout: PT10M
    UpdatePolicy:
      AutoScalingRollingUpdate:
        MaxBatchSize: 1
        MinInstancesInService: 0
        PauseTime: PT10M
        WaitOnResourceSignals: 'true'

  StatefulZookeeperEniTwo:
    Type: "AWS::EC2::NetworkInterface"
    Properties:
      Description: StatefulZookeeperGroupTwo ENI
      GroupSet:
        - !Ref PublicSshSecurityGroup
        - !Ref StatefulZookeeperSecurityGroup
      SubnetId: !Ref PublicSubnetB

  StatefulZookeeperVolumeTwo:
    Type: "AWS::EC2::Volume"
    Properties:
      AvailabilityZone: !Sub ${AWS::Region}b
      Size: 8
      VolumeType: gp2

  StatefulZookeeperGroupTwo:
    Type: AWS::AutoScaling::AutoScalingGroup
    DependsOn:
      - StatefulZookeeperEniTwo
      - StatefulZookeeperVolumeTwo
      - StatefulZookeeperGroupOne
    Properties:
      VPCZoneIdentifier:
        - !Ref PublicSubnetB
      LaunchConfigurationName: !Ref  StatefulZookeeperLaunchConfig
      LoadBalancerNames:
        - !Ref StatefulZookeeperLoadBalancer
      DesiredCapacity: 1
      MinSize: 0
      MaxSize: 1
      HealthCheckGracePeriod: '300'
      HealthCheckType: EC2
      Tags:
        - Key: instanceeniid
          Value: !Ref StatefulZookeeperEniTwo
          PropagateAtLaunch: 'true'
        - Key: instanceindex
          Value: '2'
          PropagateAtLaunch: 'true'
        - Key: instancevolume
          Value: !Ref StatefulZookeeperVolumeTwo
          PropagateAtLaunch: 'true'
        - Key: Name
          Value: 'zookeeper-two'
          PropagateAtLaunch: 'true'
    CreationPolicy:
      ResourceSignal:
        Count: 1
        Timeout: PT10M
    UpdatePolicy:
      AutoScalingRollingUpdate:
        MaxBatchSize: 1
        MinInstancesInService: 0
        PauseTime: PT10M
        WaitOnResourceSignals: 'true'

  StatefulZookeeperEniThree:
    Type: "AWS::EC2::NetworkInterface"
    Properties:
      Description: StatefulZookeeperGroupThree ENI
      GroupSet:
        - !Ref PublicSshSecurityGroup
        - !Ref StatefulZookeeperSecurityGroup
      SubnetId: !Ref PublicSubnetC

  StatefulZookeeperVolumeThree:
    Type: "AWS::EC2::Volume"
    Properties:
      AvailabilityZone: !Sub ${AWS::Region}c
      Size: 8
      VolumeType: gp2

  StatefulZookeeperGroupThree:
    Type: AWS::AutoScaling::AutoScalingGroup
    DependsOn:
      - StatefulZookeeperEniThree
      - StatefulZookeeperVolumeThree
      - StatefulZookeeperGroupTwo
    Properties:
      VPCZoneIdentifier:
        - !Ref PublicSubnetC
      LaunchConfigurationName: !Ref  StatefulZookeeperLaunchConfig
      LoadBalancerNames:
        - !Ref StatefulZookeeperLoadBalancer
      DesiredCapacity: 1
      MinSize: 0
      MaxSize: 1
      HealthCheckGracePeriod: '300'
      HealthCheckType: EC2
      Tags:
        - Key: instanceeniid
          Value: !Ref StatefulZookeeperEniThree
          PropagateAtLaunch: 'true'
        - Key: instanceindex
          Value: '3'
          PropagateAtLaunch: 'true'
        - Key: instancevolume
          Value: !Ref StatefulZookeeperVolumeThree
          PropagateAtLaunch: 'true'
        - Key: Name
          Value: 'zookeeper-three'
          PropagateAtLaunch: 'true'
    CreationPolicy:
      ResourceSignal:
        Count: 1
        Timeout: PT10M
    UpdatePolicy:
      AutoScalingRollingUpdate:
        MaxBatchSize: 1
        MinInstancesInService: 0
        PauseTime: PT10M
        WaitOnResourceSignals: 'true'

Here is a bash script to create and update the stack, you will need to pass in an AMI ID and SSH Key Pair name:

#!/bin/bash

set -euo pipefail

# ha_stateful_stack.sh
# Usage: ha_stateful_stack.sh [AMI_ID] [SSH_KEY_PAIR_NAME]

function main {
    local AMI_ID
    local SSH_KEY_NAME
    local STACK_EXISTS

    AMI_ID=${1-}
    SSH_KEY_NAME=${2-}

    if [[ -z ${AMI_ID} ]] || [[ -z ${SSH_KEY_NAME} ]]; then
        echo "Missing required arguments" >&2
        exit 1
    fi

    set +e
    aws cloudformation describe-stacks \
        --stack-name ha-stateful-zookeeper
    STACK_EXISTS=$?
    set -e

    if [[ $STACK_EXISTS -ne 0 ]]; then
        echo "Creating stack"

        aws cloudformation create-stack \
            --stack-name ha-stateful-zookeeper \
            --template-body file://ha_stateful.zookeeper.template.yaml \
            --capabilities CAPABILITY_IAM \
            --parameters \
                ParameterKey=KeyName,ParameterValue="${SSH_KEY_NAME}" \
                ParameterKey=InstanceImageId,ParameterValue="${AMI_ID}"

        aws cloudformation wait stack-create-complete \
            --stack-name ha-stateful--zookeeper

        echo "Stack created"
    else
        echo "Updating stack"

        aws cloudformation update-stack \
            --stack-name ha-stateful-zookeeper \
            --template-body file://ha_stateful.zookeeper.template.yaml \
            --capabilities CAPABILITY_IAM \
            --parameters \
                ParameterKey=KeyName,ParameterValue="${SSH_KEY_NAME}" \
                ParameterKey=InstanceImageId,ParameterValue="${AMI_ID}"

        aws cloudformation wait stack-update-complete \
            --stack-name ha-stateful-zookeeper

        echo "Stack updated"
    fi
}

main "$@"

You can grab a list of Ubuntu Xenial AMI IDs like so:

aws ec2 describe-images --owners 099720109477 --filters "Name=name,Values=*ubuntu-xenial-16.04-amd64*" "Name=virtualization-type,Values=hvm" "Name=root-device-type,Values=ebs" "Name=hypervisor,Values=xen" --output text --query "reverse(sort_by(Images, &CreationDate))|[].ImageId" | tr '\t' '\n'

Note that launching this stack will cost some money.

Normally you wouldn't use a base Ubuntu AMI here, you should be baking an AMI with everything that you need via a tool like Packer, I am using a base AMI here by way of example.

This is a pretty verbose template, but as you can see we have three ASGs each with their own ENI and EBS volume, the IDs of which are assigned as tags to instances. During updates each ASG will terminate it's existing instance in order to release it's ENI, EBS volume and cluster index, each succesive ASG depends on the previous one so only one instance is updated at once. An internal ELB is provided so services that want to use Zookeeper can access it via DNS pointing to this ELB, connecting services just need one ZK instance to work, it doesn't matter which one.

Most of the heavy lifting takes place in the shared LaunchConfig - so let's take a closer look. The two files in the AWS::CloudFormation::Init section are just some pieces of config for Zookeeper, we will examine these later on, for now let's take a look at the UserData:

#!/bin/bash
set -euo pipefail
addgroup \
  -gid 5000 \
  zookeeper
adduser \
  --gid 5000 \
  --uid 5000 \
  --no-create-home \
  --disabled-password \
  --disabled-login \
  --system \
  zookeeper
apt-get update
apt-get install -y zookeeper zookeeperd python2.7 python-pip curl jq ntp
service zookeeper stop
pip install awscli https://s3.amazonaws.com/cloudformation-examples/aws-cfn-bootstrap-latest.tar.gz
cfn-init \
  --resource StatefulZookeeperLaunchConfig \
  --stack ${AWS::StackName} \
  --region ${AWS::Region}
INSTANCE_ID=$( curl -s http://169.254.169.254/latest/meta-data/instance-id )
INSTANCE_TAGS=$( aws ec2 describe-tags \
  --filters "Name=resource-id,Values=$INSTANCE_ID" \
  --region ${AWS::Region} \
  --query "{ instanceeniid:Tags[?Key=='instanceeniid'].Value, instanceindex:Tags[?Key=='instanceindex'].Value, instancevolume:Tags[?Key=='instancevolume'].Value }" )
INSTANCE_ENI_ID=$( echo "$INSTANCE_TAGS" | jq -r '.instanceeniid | .[0]' )
aws ec2 attach-network-interface \
  --network-interface-id $INSTANCE_ENI_ID \
  --instance-id $INSTANCE_ID \
  --region ${AWS::Region} \
  --device-index 1
INSTANCE_ENI_IP=$( aws ec2 describe-network-interfaces \
  --network-interface-ids $INSTANCE_ENI_ID \
  --region ${AWS::Region} \
  --query "NetworkInterfaces[0].PrivateIpAddress" \
  --output text )
GATEWAY_IP=$( /sbin/ip route | awk '/default/ { print $3 }' )
echo -e "auto eth1\niface eth1 inet dhcp\n  post-up ip route add default via $GATEWAY_IP dev eth1 tab 2\n  post-up ip rule add from $INSTANCE_ENI_IP/32 tab 2 priority 600" \
  > /etc/network/interfaces.d/eth1.cfg
sleep 15s
service networking restart
INSTANCE_VOLUME_ID=$( echo "$INSTANCE_TAGS" | jq -r '.instancevolume | .[0]' )
aws ec2 attach-volume \
  --volume-id $INSTANCE_VOLUME_ID \
  --instance-id $INSTANCE_ID \
  --region ${AWS::Region} \
  --device /dev/xvdb
sleep 15s
set +e
blkid -L zookeeperdata
FILESYSTEM_EXISTS=$?
set -e
if [[ $FILESYSTEM_EXISTS -ne 0 ]]; then
  mkfs -t ext4 -L zookeeperdata /dev/xvdb
fi
mkdir -p /var/lib/zookeeper
mount /dev/xvdb /var/lib/zookeeper
echo "LABEL=zookeeperdata /var/lib/zookeeper ext4 nofail 0 0" >> /etc/fstab
chown -R zookeeper:zookeeper /var/lib/zookeeper
INSTANCE_ASG=$( aws ec2 describe-instances \
  --instance-id $INSTANCE_ID \
  --region ${AWS::Region} \
  --query "Reservations[0].Instances[0].Tags[?Key=='aws:cloudformation:logical-id'].Value" \
  --output text )
echo "$INSTANCE_TAGS" \
  | jq -r '.instanceindex | .[0]' \
  > /var/lib/zookeeper/myid
service zookeeper start
set +e
ps auxw | grep -P '\b'zookeeper'(?!-)\b'
ZOOKEEPER_RUNNING=$?
set -e
cfn-signal \
  -e $ZOOKEEPER_RUNNING \
  --stack ${AWS::StackName} \
  --resource $INSTANCE_ASG \
  --region ${AWS::Region}

We don't need to go through all of this in detail so let's skim through the main bits:

  • First off we pre-setup Zookeeper's user and group with a set UID and GID, this is so we always have the same IDs for file ownership of EBS volumes
  • Zookeeper and some useful utilities are installed, ntp is needed to ensure that all ZK servers have the time set accurately
  • The instances ENI, EBS and Index values are gathered from the instances tags and stored for later use, these were assigned by the instances dedicated ASG
  • With these IDs in hand we can attach the ENI and set it up, note that we have given each instance an IAM role that allows for the attachment of ENIs and EBS volumes
  • For the EBS volume we need to attach and check for a filesystem, initialising the volume if necessary and mounting at Zookeepers data dir
  • Finally we can start Zookeeper and send an ASG signal to denote that the instance has started up

The use of "set -euo pipefail" at the top of the script is intended to make the whole script quit if anything goes wrong, resulting in no signal to the ASG which will eventually timeout the update and fail/rollback.

Try updating the stack with different AMI IDs, you should see everything come back with the same secondary private IPs and volumes.

Zookeeper config and testing

In order to test that everything is working you are going to need to know a few Zookeeper commands to check the status of the cluster and test whether data is being carried over between deployments, it's worth taking a look at the Zookeeper admin guide but I will show you the basics here.

If you are on an instance you can issue the following commands (known as the four letter words) to get information about the cluster state:

# check general server state
echo ruok | nc [private-ip] 2181
# lists brief details for the server and connected clients
echo stat | nc [private-ip] 2181
# outputs a list of variables that could be used for monitoring the health of the cluster.
echo mntr | nc [private-ip] 2181

You can issue these commands to each instance in the cluster, one of them should identify itself as the leader, the others as followers. If you get the dreaded "this zookeeper instance is currently not serving requests" message then something has gone wrong, check the ZK config file (detailed below), check all IPs are correct and that all instances can contact each other. 

You can also store arbitray data in the cluster to test if it is preserved after updates, each server comes with a cli tool that enables you to interact with Zookeeper directly:

# activate cli
/usr/share/zookeeper/bin/zkCli.sh
# list all data via Cli
ls /
# store some data via Cli
create /testing spam
# get some data via Cli
get /testing

The main Zookeeper config file is zoo.cfg, stored in /etc/zookeeper/conf on our instances, most of this config is pretty straight forward, the Zookeeper admin guide can provide more detail on this. It's worh taking a look at these settings however:

tickTime=1000
initLimit=1200
syncLimit=300

The tickTime denotes the base unit of time in milliseconds, in this instance our base tick time is 1s. The initLimit is the number of tickTime units that an instance is allowed to start up and form a cluster (ie. elect a leader), in this instance the value is essentially 1200s or 20 minutes which is 2 times the signal timeout for an ASG. Why is this number so high? For a new cluster created from scratch you will only have one machine for a while, it will take some time for two machines to come up and form a cluster, so we need to give enough time here for two machines to start, otherwise everytime you start a new cluster it will come up dead and need manual intervention, no good for automation at all! We can of course bring this down as needed via some experimentation. The syncLimit is the time allowed for an instance to sync it's data with the rest of the cluster, it's a factor of tickTime once again so we've set it to 5 minutes here, since we are using EBS volumes instances should sync pretty much instantly but we've given some wiggle room here. 

Stateful wrap up

So that's it, there is a lot of Zookeeper specific stuff here but this technique can be applied to many stateful application that can form a cluster such as ElasticSearch, Consul and Kafka. It should be possible to apply this method to applications that require manual master/slave setup, such as MySQL and Postgres; the instance index can be used to manually assign these roles, something to automate slave to master promotion however will be required. There are some applications that cannot form clusters without some extra cost/licensing or through the use of a custom load balancer of some kind, such Graphite/Carbon and InfluxDB, these services need their data to be exactly the same across all instances. It may be possible to use EFS with these services, I will hopefully have an article on this at some point in the future.

In this article I haven't used any orchestration or config management tools, such as Terraform or Ansible, you should really consider using one of these instead of handcoding as we have here, instead of handcoding three different ASGs you could easily use a templating language to write out n number of ASGs based on a variable.

You can see the examples shown in this article, along with other AWS examples, in my Cloudformation examples repo. Please post any questions as comments below.