I spoke too soon about AWS fixing the “Docker images” exhausting all the instance’s disk space issue being fixed in the previous blog.
This morning we had some downtime of a non-critical service and it leaves me with a lot of questions!
ecs-cli scaledoesn’t allow you to express the instance type, only up does!?
How can avoid issues like this whilst trying to quickly free space?!
$ docker rmi efaaf58ff978 Error response from daemon: devmapper: Error saving transaction metadata: devmapper: Error writing metadata to /var/lib/docker/devicemapper/metadata/.tmp736910121: write /var/lib/docker/devicemapper/metadata/.tmp736910121: no space left on device
Why did this happen at an odd time? There was no deployment at 6AM!
I’ll post the answers to the questions below when I find out. Note: We were running AWS ECS Agent 1.11.1. It’s very likely that new versions will fix all the issues logged here.
Before nuking the machine, I did manage to grab the logs over some time. It
took time unsurprisingly because there were 3GB of
logs! I have
commented upon the ECS agent
that they should be using systemd’s
journalctl or at least rotating the logs
more aggressively with compression.
Error retrieving stats for container LONG_HASH: dial unix /var/run/docker.sock: socket: too many open files
Which is addressed in https://github.com/aws/amazon-ecs-agent/issues/488 aka 1.13.0!
Thus, if you would like the most up to date agent, you can either:
a) Bake the update commands as listed here into your instance: http://docs.aws.amazon.com/AmazonECS/latest/developerguide/ecs-agent-update.html#d0e7941
b) Update the underlying Cloud Formation template to use the most up to date AMI version when there is a new version available.
3/8. ~6AM just so happened to be the time the repeating logs filled the space on both instances at effectively the exact same time. Doh!
I need the change the “Launch configuration” of the “Autoscaling group”.
Check the launch configuration. I need to figure out how to specify EBS volume size manually
Should have tried forcing it, like so
docker rmi -f <image_id> and of course nuking the logs.
Devops at Spuul. Any tips or suggestions? Reach out!