Dockerで異常を検知したコンテナを回復するイメージを使用してみる

はじめに

Docker でコンテナが Exited などになった際に自動で回復する機能がほしいなと思ったので、調べたら willfarrell/autohealという素晴らしいイメージがあるとのことなので試してみる。
まずは、Dockerの機能である HEALTH CHECK の機能を使用して動作を見た後に、自動回復の willfarrell/autoheal イメージを追加して動作を確認してみる。

環境

Windows 10 Professional
docker desktop 4.17.1 (101757)
Docker Compose version v2.15.1

DockerのHealthCheck機能を試す

compose.ymlの作成

まずは、下記の compose.ymlを作成してビルド+起動をする。

services:
  php:
    image: trafex/php-nginx 
    ports:
      - 80:8080
    healthcheck:
      test: curl --fail http://localhost:8080/ || exit 1
      interval: 5s
      timeout: 3s
      retries: 3
      start_period: 10s

※これは、簡単に説明すると localhost:8080 を curl してHTTPのステータスコードが失敗になる (404, 503など)と unhealty になるようになっている。

ステータス:healthyの確認

さて、これをビルドした後に起動すると、下記のように healthy というステータスが追加される。

$ docker ps
CONTAINER ID   IMAGE              COMMAND                  CREATED         STATUS                            PORTS                  NAMES
ca486c6761e3   trafex/php-nginx   "/usr/bin/supervisor…"   3 seconds ago   Up 2 seconds (health: starting)   0.0.0.0:80->8080/tcp   autoheal-test-php-1

$ docker ps
CONTAINER ID   IMAGE              COMMAND                  CREATED         STATUS                   PORTS                  NAMES
ca486c6761e3   trafex/php-nginx   "/usr/bin/supervisor…"   8 seconds ago   Up 6 seconds (healthy)   0.0.0.0:80->8080/tcp   autoheal-test-php-1

画面は下記のようになっている。

ステータス:unhealthyの確認

この状態で、nginx のコンテナに入り、 php-fpm系のプロセスを killしてみる。

そうすると、下記のようにエラーページになる。 curl を叩くとこんな感じ

$ curl -I localhost
HTTP/1.1 502 Bad Gateway
Server: nginx
Date: Sat, 01 Apr 2023 06:36:20 GMT
Content-Type: text/html
Content-Length: 497
Connection: keep-alive
ETag: "63501ae6-1f1"

そして、 docker compose ps でコンテナの状態を調べてみると下記のようになっている。
unhealth となっているのがわかるだろうか。

$ docker compose ps
NAME                  IMAGE               COMMAND                  SERVICE             CREATED             STATUS                     PORTS
autoheal-test-php-1   trafex/php-nginx    "/usr/bin/supervisor…"   php                 7 minutes ago       Up 7 minutes (unhealthy)   0.0.0.0:80->8080/tcp

自動で回復するようにしたい!

ここからが本題となるが、 docker のヘルスチェック機能はあくまで死活監視をするのみなので、自動回復の機能はない。
そのため、自動回復ができるようになるイメージを compose.yml に追加する。

willfarrell/autohealを使う

自動回復できるように下記のように compose.yml を修正する。

services:
  php:
    image: trafex/php-nginx 
    ports:
      - 80:8080
    healthcheck:
      test: curl --fail http://localhost:8080/ || exit 1
      interval: 5s
      timeout: 3s
      retries: 3
      start_period: 10s
    labels:
      - "autoheal=true"
  autoheal:
    image: willfarrell/autoheal:latest
    tty: true
    container_name: autoheal
    restart: always
    volumes:
      - /var/run/docker.sock:/var/run/docker.sock

https://hub.docker.com/r/willfarrell/autoheal

compose.ymlを変更したらビルドをし直し、起動する。

自動回復のチェック

先程と同様の手順で復帰するかどうかを見てみる。

nginx のコンテナに入り、 php-fpm系のプロセスを killしてみる。

docker compose exec php sh

kill をすると当然 502 になりエラーページになるが、数秒するとautohealのコンテナにより回復する。

実施した際の gif を用意したので下記で確認をしてみてほしい。

このイメージを使用することで自動回復ができることがわかった。

公式での自動回復が提供されるまでの代替手段となっているようだが、いつ公式にサポートされるのかもウォッチしていきたい。

Monitor and restart unhealthy docker containers. This functionality was proposed to be included with the addition of HEALTHCHECK, however didn’t make the cut. This container is a stand-in till there is native support for –exit-on-unhealthy https://github.com/docker/docker/pull/22719.

https://hub.docker.com/r/willfarrell/autoheal より引用

参考

DockerのHEALTHCHECKの動きを理解する
https://qiita.com/knjname/items/9c0a89af2d9e49749017
wukkfarrell/autoheal
https://hub.docker.com/r/willfarrell/autoheal
Auto-Restart Unhealthy Containers
https://sdr-enthusiasts.gitbook.io/ads-b/useful-extras/auto-restart-unhealthy-containers
Configuring HealthCheck in docker-compose
https://medium.com/@saklani1408/configuring-healthcheck-in-docker-compose-3fa6439ee280

おわりに

Dockerで自動回復してほしいなという場面がたまーにあるので、このあたりの勉強をしてみた。
このあたり細かい知識もほしいなと思う所存。