Featured Image
Software Development

Avoid downtime during the auto-scaling scale-out event


Recently, I configured time-based auto-scaling for one of our production applications running on an Elastic beanstalk environment. Our use case is to scale-out ec2 instances on Monday morning and scale-in every Friday night.

One such Monday morning, we got about 4xx requests more than the threshold we had set for the purpose of monitoring. This triggered an investigation to see what had gone wrong. We found out that as soon as scale-out event triggers, a new instance is created and put behind a load balancer. Elastic beanstalk performs container commands as per configuration.

In this case, it took 3–5 minutes to complete these container commands. Therefore, even though a new instance had been created and put behind a load balancer, it was not ready to service requests for a short period of time. Moreover, we had few users that were trying to access the application during this particular time interval. Some requests were services by already healthy instances which resulted in 2XX requests and some of these were newly created instances that failed with 4XX and 5XX responses.

In this blog post, I am going to share how I fixed this problem for our production system.


Create a lifecycle hook for the auto-scaling group

As soon as some new instance is created or terminated by the auto-scaling group, it sends out events that we can listen to by configuring the lifecycle hook.

As per the AWS official documentation about lifecycle hook:

When Amazon EC2 Auto Scaling responds to a scale-out event, it launches one or more instances. These instances start in the Pending state. If you added an autoscaling:EC2_INSTANCE_LAUNCHING lifecycle hook to your Auto Scaling group, the instances move from the Pending state to the Pending:Wait state. After you complete the lifecycle action, the instances enter the Pending:Proceed state. When the instances are fully configured, they are attached to the Auto Scaling group and they enter the InService state.

When Amazon EC2 Auto Scaling responds to a scale-in event, it terminates one or more instances. These instances are detached from the Auto Scaling group and enter the Terminating state. If you added an autoscaling:EC2_INSTANCE_TERMINATING lifecycle hook to your Auto Scaling group, the instances move from the Terminating state to the Terminating:Wait state. After you complete the lifecycle action, the instances enter the Terminating:Proceed state. When the instances are fully terminated, they enter the Terminated state.

We can create a lifecycle hook for an auto-scaling group via a console or AWS cli by following the steps mentioned in this documentation. But For applications running on Elastic beanstalk, it is advised- not to update any resource created by EB directly. Therefore I added the following config file in .ebextentions folder.

Once I deployed a new version on the Elastic beanstalk environment with this config file, it created a lifecycle hook for the auto-scaling group attached to the environment. In the console, Open EC2 > Auto scaling groups > select group > switch to instance management tab.

Instance management tab for an auto-scaling group

In case a scale-out event triggered for this auto-scaling group, it would keep the instance under Pending:Wait status. However since we had created lifecycle hook with transition autoscaling:EC2_INSTANCE_LAUNCHING, it would not put an instance behind load balancer until we called complete lifecycle action.

 

instance management tab after scale-out event

Ideally, we should be putting an instance behind the load balancer only after finishing all deployment steps, including container commands or deployment scripts, so that the instance is ready to service requests.

Complete lifecycle action once an instance is healthy

Elastic Beanstalk has a standard directory structure for hooks. These hooks are basically scripts that you can run during lifecycle events and in response to management operations. In this case, we want to run a command after deployment is done on a new instance. For that purpose, we need to put a script under the directory /opt/elasticbeanstalk/appdeploy/post/. This script should contain the command to complete the lifecycle event for that new instance.

To run complete lifecycle action command, we would need either lifecycle-action-token or the instance id.

aws autoscaling complete-lifecycle-action --lifecycle-action-result CONTINUE    --lifecycle-hook-name my-launch-hook --auto-scaling-group-name my-asg --lifecycle-action-token bcd2f1b8-9a78-44d3-8a7a-4dd07d7cf635
aws autoscaling complete-lifecycle-action --lifecycle-action-result CONTINUE    --instance-id i-1a2b3c4d --lifecycle-hook-name my-launch-hook --auto-scaling-group-name my-asg

Since we would be running a script on the instance itself as a part of the EB post-deployment hook, we could easily get the instance id. Hence there would be no need to subscribe to the auto-scaling group’s lifecycle hook for lifecycle action token.

In order to add a new file for an application running on Elastic beanstalk, we would have to add a new config file which creates a post-deployment hook for all instances. Below is the code snippet showing how to do the same:

You need to add this config file under the .ebextension folder in the root of your application code.

That’s about it. Following these steps, your auto-scaling should work smoothly without any downtime during a scale-out event. Thanks for reading.

References:

author
Hiren Patel
My skills includes full stack web development and can also work on deployment and monitoring of web application on a server. The kind of work that I can do and have experience : Back-end: 1. Python scripting 2. Django Rest framework to create REST APIs 3. Writing unit test case and use automation to test build before deployment Front-end: 1. Web application development using Angular 2/4/6 Framework 2. Challenging UI development using HTML5, CSS3 3. CSS, Jquery and SVG animations Others: 1. AWS ec2, s3, RDS, ECS, CI-CD with AWS, 2.Jenkins