AWS ECS encountered error "AGENT"

Any of my fellow system administrators using Amazon’s container service must have hit this AGENT Connected false issue as we have with the Amazon ECS agent.

AWS’s advice is to login to your ECS EC2 instances and restart the agent manually!

For us this is not a very easy / scalable thing to do, so our current work around is to scale up and down. For example:

Agent disconnected
ecs-cli scale --size 3 --capability-iam
ecs-cli compose service scale 3
ecs-cli compose service scale 2
ecs-cli scale --size 2 --capability-iam

This should effectively kill the errant ECS instance & its AWOL agent and bootstraps a fresh one. Another tip is always make sure you are running the latest ecs-cli, since the ECS AMIs are annoyingly hard coded into this client! Furthermore you might want to check you EC2 instances are not old, by inspecting their “Launch time”, keep them fresh by terminating old ones.

Hopefully AWS will fix this issue as they have fixed the silly old images not being cleared out one. Or perhaps we have to start using Lambda? ;) For further tips to get started with ECS, checkout this guide.

If you need to ssh into your cluster, you might find this aws-cli/jq/shell script handy:

# Choose your cluster so you can figure out the IPs and ssh in to inspect
select cluster in $(aws ecs list-clusters | jq -r .clusterArns[])
echo Cluster: "$cluster"
aws ecs list-container-instances --cluster "$cluster" | jq -r .containerInstanceArns[] |
while read instance
	aws ecs describe-container-instances --cluster "$cluster" --container-instances "$instance" |
	jq -r .containerInstances[].ec2InstanceId | while read instanceid
		aws ec2 describe-instances --filters Name=instance-id,Values="$instanceid" |
		jq -r '.Reservations[].Instances[] | .PrivateIpAddress' | while read IP
			echo ssh -i ec2-user@$IP

Posted 2016-10-11
