Sunday, June 17, 2018

Amazon ECS Daemon Scheduling Strategy

So this past week AWS launched a new daemon scheduling strategy for ECS (Elastic Container Service).



What is a Daemon? 

There are plenty of excellent definitions out there so I'm not going to attempt to reinvent the wheel.

"A daemon is a type of program on Unix-like operating systems that runs unobtrusively in the background, rather than under the direct control of a user, waiting to be activated by the occurance of a specific event or condition" -

What Wikipedia says: Daemon


What is a Daemon Scheduling Stratergy?

Imagine you have a cluster of N instances. Lets say you want to ensure that each instance in your cluster runs a copy of a given container (or task in ECS parlance). Well, this is what the daemon scheduling strategy does for you.

But, could I not just create a service that has the same number of tasks as there are instances in my cluster and use a distinct instance placement constraint to ensure that each instance is running a copy of a given task. Well, yes, you could do that. But what happens when your cluster scales? You'd need to go and update the service to reflect the changes in cluster size, which doesn't sound like much fun.

This is where the daemon scheduling stratergy really shines. It does not care about how many nodes you have, instead, it just makes sure that for *every node, there is a copy of a particular task running on it. Simples!

#Tasks == #Nodes


*It is possible to scope down the instances to which tasks within a daemon service are scheduled using placement constraints. See towards the bottom of this post for more details.

More information about this new capability can be found here:


What kind of things would I use a Daemon Scheduling Strategy for?

A common use case is agents. Monitoring, log collection and security to name a few. These are all things that we generally want running in the background and that wait to be activated by a specific event, some metrics being emitted or log files being appended to.

A lot of third-party monitoring solutions that require an agent to be deployed, ship that agent as a Docker container. Datadog is a good example. This is a great use case for the daemon scheduling strategy.


So how do I use this new Daemon Scheduling Stratergy?

I'm going to deploy the Datadog agent to my small ECS cluster to give a practical example of how and why the Daemon scheduling strategy is so cool.

Let's start by creating a new task definition for the Datadog agent. The template task definition can be found here

The task definition can be created using the ECS management console, like so: 


Or using the AWS CLI, like so

aws ecs register-task-definition \
--cli-input-json file://path/to/datadog-agent-ecs.json

You'll need to mod the task definition slightly in include your specific Datadog API key.

The next step is to use this task definition to create a new service. If you're using the management console, click on the Actions menu and choose Create Service:

Fill out the details on the service configuration screen as you like. But what's important here is that where we would typically choose a set number of replicas, we're instead going to choose DAEMON.

I'm then going to click through the remaining configuration screens because my daemon does not need to be load balanced and daemon services don't support auto scaling. 

The last step is for me to click Create Service. 

This whole process can be distilled down to a simple CLI call with a few arguments like this:

aws ecs create-service \ 
--service-name datadog-agent \
--cluster daemonset \
--scheduling-strategy DAEMON \
--task-definition datadog-agent-task:1 \
--region us-west-2 

You'll need to make sure you're running at least version 1.15.37 of the AWS CLI for this to work.


Did it work? 

Good question. Given what we now know about the job of the daemon scheduling strategy we can hypothesize the if we have a five instance cluster, then there should be five tasks running. One for each of the running instances.

First lets check how many container instances we have:

aws ecs list-container-instances \
--cluster daemonset \
--region us-west-2

Next, lets check how many tasks we have running: 

aws ecs list-tasks \
--cluster daemonset \
--region us-west-2

If everything went according to plan, we should see the same number of tasks as we have running container instances.


Selective Daemon Placement

What if I don't want my daemon service to run on all of my container instances, only the ones which are use for a specific purposes. For example, we may want our Datadog agent to only run on container instances that are in a production fleet. No problemo!

Placement constraints and custom attributes are our friend.

We start by adding  a custom attribute to a certain subset of container instances in our cluster. I'm going to add the attribute to a single container instance. A custom attribute is nothing more than metadata that we can attach to container instances.

aws ecs put-attributes \
--cluster daemonset \
--attributes "name=env,value=prod,targetType=container-instance,targetId=4914bc6c-9f21-4a28-bcd0-5b29e210ac79" \
--region us-west-2

This command adds a custom attribute with a name of "env" with a value of "prod" to the container instance "4914bc6c-9f21-4a28-bcd0-5b29e210ac79". 

Next, let's use some jq-fu to see if it worked as expected:

aws ecs describe-container-instances \
--container-instance 4914bc6c-9f21-4a28-bcd0-5b29e210ac79 \
--cluster daemonset \
--region us-west-2 \
| jq '.containerInstances[].attributes[] | select (.name == "env")'

We should see a custom attribute with the name of "env" and the value of "prod" returned.

Now let's create a new daemon service that includes a placement constraint for container instances with an env attribute that has a value of prod:

aws ecs create-service \
--service-name datadog-agent \
--cluster daemonset \
--scheduling-strategy DAEMON \
--task-definition datadog-agent-task:1 \
--placement-constraints type=memberOf,expression=attribute:env==prod \
--region us-west-2

If we take a look at the tasks which are running, we should now only see one, and, at least in my case, it will have been scheduled on instance 4914bc6c-9f21-4a28-bcd0-5b29e210ac79:

aws ecs list-tasks \
--cluster daemonset \
--region us-west-2

And that is about all there is to it. This is a welcome addition to ECS and helps address a great many use cases. I'm really interested to see how this new capability gets used.


No comments:

A little about Me

My photo
My name is Mitch Beaumont and I've been a technology professional since 1999. I began my career working as a desk-side support engineer for a medical devices company in a small town in the middle of England (Ashby De La Zouch). I then joined IBM Global Services where I began specialising in customer projects which were based on and around Citrix technologies. Following a couple of very enjoyable years with IBM I relocated to London to work as a system operations engineer for a large law firm where I responsible for the day to day operations and development of the firms global Citrix infrastructure. In 2006 I was offered a position in Sydney, Australia. Since then I've had the privilege of working for and with a number of companies in various technology roles including as a Solutions Architect and Technical team leader.