Kafka Connect Healthchecks

Imagine you have to run Kafka Connect behind a load balancer. Imagine further that that load balancer is incapable of handling basic authentication for health checks, and that this is required for monitoring reasons. That is exactly the position we found ourselves in this week, and this blog post covers how we solved it.

We first considered accepting an HTTP 403 as a success, but that wouldn’t actually check anything other than the fact that Kafka Connect is currently running.

We then considered using the Python project kafka-connect-healthcheck, but that requires Python 3. And despite the fanfare of Python 2 finally being End Of Life, you still don’t get Python 3 out of the box on many Linux distros. And installing Python packages is still fiddly (cf the recent meme of needing several years of Linux experience before Python package management becomes “simple”).

We considered installing Nginx, proxying Kafka Connect without the need for authentication, and then only exposing that to the load balancer. But we then realised that’s a terrible idea and a lot of work.

There is a nice clean alternative that is perfect for this use-case. Goss is a single binary file written in Go, can run a number of different health checks, and can serve a healthz page with the current health status. One of the more recent health checks it can perform is authenticated http requests, ensuring an HTTP 200 is returned. This makes it really simple to provide unauthenticated healthchecks for not just Kafka Connect, but also Confluent REST Proxy and KSQL.

Here’s our setup for monitoring Kafka Connect, which we are deploying through Terraform as Launch Config.

First we need to install Goss. The simplest way of doing this is with a curl | sh, but you can also download the releases from GitHub or build it yourself.

curl -fsSL https://goss.rocks/install | sh

Goss then needs a configuration file, written in Yaml. The GitHub repository has all the documentation hidden away in the docs directory, but the simplest option for Kafka Connect looks something like the following (which we put in /etc/goss/goss.yaml:

http:
  http://localhost:8083/connectors:
    # required attributes
    status: 200
    ## optional attributes
    #allow-insecure: false
    #no-follow-redirects: false # Setting this to true will NOT follow redirects
    #timeout: 1000
    #request-headers: # Set request header values
    #   - "Content-Type: text/html"
    #header: [] # Check http response headers for these patterns (e.g. "Content-Type: text/html")
    #body: [] # Check http response content for these patterns
    username: "admin" # username for basic auth
    password: password # password for basic auth
    skip: false

And finally we have a systemd service unit to run this, but you can also test this with goss -g /etc/goss/goss.yaml s --listen-addr 0.0.0.0:18083.

[Unit]
Description=Systemd service to monitor Kafka Connect Health and provide ALB with the required status code of 200
After=network.target
StartLimitIntervalSec=0
[Service]
Type=simple
Restart=always
RestartSec=1
User=root
ExecStart=/usr/local/bin/goss -g /etc/goss/goss.yaml s --listen-addr 0.0.0.0:18083
[Install]
WantedBy=multi-user.target

The load balancer then needs configuring to hit http://<address>:18083/healthz, which we can test with curl http://localhost:18083/healthz -v. If we get a 200 OK status back, then everything is healthy, and the load balancer will send traffic to this server. If we don’t get a 200 OK status back, then we can raise alarms.