This machine runs a lot of services and I don't use all of them. After breaking several of them and not noticing (again), I decided to finally set up service monitoring. After some research, Monit was relatively easy to set up and seems to meet my needs. I figured other people might want some examples of how to use it, so this post describes how to set it up and you can see my config file at the end.
My use-case is one server monitoring itself. The obvious question is "who monitors the monitor?", but my main concern is not noticing services I don't use. If the entire server is down I'll probably notice eventually. A bigger problem is that I'm not testing the firewall rules and routing. A separate server really would be ideal, but since I can't run it from home (ISP port filtering blocks SMTP), I'd have to pay for another VPS, and it doesn't seem worth it right now.
I've worked with sysadmins in the past who liked Nagios, so that was my first choice for this, but it's complicated to set up and extreme overkill for monitoring one server. I looked at Sensu too, and it seems nicer but still overkill. I chose Monit because it's easy to setup (see next) and the barebones UI doesn't matter to me (I just want a basic up/down status and emails).
To install Monit on Fedora, you run:
sudo dnf install monit # edit config files sudo monit -t # check config syntax sudo systemctl start monit sudo systemctl enable monit
edit config files step is by-far the longest. First you'll want to edit
/etc/monitrc and set your mail server and who to sent alerts to. For me this was just:
set mailserver localhost set alert firstname.lastname@example.org
If you're using a remote mail server, you probably need to configure a user name and password. I also uncommented the
eventqueue lines so alerts won't be lost if the mail server goes down.
The only other thing I changed in this file is the
set httpd port ... section, where I changed the admin password and removed the localhost restriction (so I can access it remotely at http://status.brendanlong.com).
I put the rest of my configuration in individual files in
/etc/monit.d. For example, monitoring of things accessed at "brendanlong.com" is in a file named
/etc/monit.d/brendanlong.com and http://etherealspring.com/ is in
/etc/monit.d/etherealspring.com. This is just personal preference, but it will be easier to handle package updates this way, and I should be able to find things faster.
To write these config files, refer to the Monit documentation. Monit can check a lot of things like processes and system health, but I'm a believer in checking the thing you actually care about. I don't care what processes are running or what files exist, I just want to sure that you can get the correct pages from each HTTP server and the other servers are responding in reasonable ways. To do that, I used
check host rules exclusively.
Host rules take the form:
check host [unique name] with address [actual domain name] if failed [rules] then [alert / restart / etc.]
The unique name part is annoying, since Monit creates a rule for your server automatically, and you can't add to it (as far as I can tell — email me if this isn't true). I got around this by naming it "localhost" instead of "brendanlong.com".
The part I had the most trouble with was figuring out the rules. Here's what I found:
- You'll always want a
port = [num]rule. This works how you'd expect.
- If you're testing one of the supported protocols, add a
protocolsection. It supports all of the major protocols like HTTP(S), SMTP(S), IMAP(S), etc. If you tell it the protocol, it will ensure that the endpoint not only connects, but gives a reasonable response.
- For protocols that have logins, you can give a
passwordand it will test if the login succeeds.
- For HTTP, you can give the expected
statusand text that should be in the response with
- Order matters. For example,
protocol http status 200 content = "Brendan Long"is valid, but
protocol http content = "Brendan Long" status 200is not. See the syntax in the documentation for the correct order.
- If you don't set
protocol, it will just test if a TCP connection succeeds. You can do more complicated checks with
expect(send a text or binary message and check the response).
- You probably want to setfault tolerance for some rules. In my case, the connection to my SMTP server would randomly fail, but it doesn't really matter as long as a retry works. I made it quieter by adding
for 3 cyclesto that rule.
Here's the config file for brendanlong.com. At some point I'll make it do more extensive testing for Minecraft and SyncThing, but this gets me 90% of what I wanted:
check host localhost with address brendanlong.com if failed port 22 protocol ssh then alert if failed port 443 protocol https status = 301 then alert if failed port 80 protocol http status = 301 then alert if failed port 25 protocol smtp for 3 cycles then alert if failed port 465 protocol smtps then alert if failed port 143 protocol imap then alert if failed port 993 protocol imaps then alert # Minecraft if failed port 25565 then alert # SyncThing if failed port 22000 then alert check host www.brendanlong.com with address www.brendanlong.com if failed port 443 protocol https status = 200 content = "Brendan Long" then alert if failed port 80 protocol http status = 301 then alert check host wiki.brendanlong.com with address wiki.brendanlong.com if failed port 80 protocol http status = 403 then alert