A lightweight service health check tool written in Bash
Part of the Project Announcements series.
For websites like a2mi.social and other apps and services I run, I need to know when thereβs a problem. But for personal & crowdfunded projects with few (or 1) users, services like Pingdom are expensive and overkill.
So I wrote lightweight-healthcheck, a Bash script which I run periodically (typically every minute or two) via cron. When the service itβs checking goes down, I get an email and text message. When it recovers, I get another email & text.
It has some useful features, above & beyond a trivial one-off script one might write to achieve minimum viable monitoring:
- Written in Bash & trivially deployable on Linux or macOS.
- Email & SMS notifications, including date/time and the hostname that sent the alert.
- Mail/SMS alert rate limiting, to avoid blowing through your Twilio/Mailgun quota.
- Customizable delay between first detecting a down condition and sending alert.
- Logging of down/alert/clear events. (Logs are automatically written to
~/.lightweight-healthcheck
for the user running the checks.)
SMS delivery is acheived with Twilio. For emails, the script uses the mailx
tool, and your server will need to be configured correctly to deliver mail. I personally use Mailgun to ensure mail from my servers reaches me reliably. You can set this up for your Linux servers using this guide.
To deploy the script, you just make a copy, customize the variables on top, and write a check in Bash for your service.
Some tips on writing checks with curl
:
- Pass
-s
to silence its output. - Set
--connect-timeout
to a small number (I usually start with 5 seconds, and tweak it from that as needed). - See stackoverflow.com/a/42873372 for notes on
curl
βs retry options.
Here are a couple example checks I use for my own services:
# verify that my website is online & serving my home page:
curl -s --connect-timeout 5 --max-time 15 --retry 3 --retry-max-time 50 https://www.dzombak.com | grep -c "<title> # Chris Dzombak</title>"
# verify that the a2mi.social streaming API is online:
curl -s --connect-timeout 5 https://a2mi.social/api/v1/streaming/health | grep -c OK