This machine runs a lot of services and I don't use all of them. After breaking several of them and not noticing (again), I decided to finally set up service monitoring. After some research, Monit was relatively easy to set up and seems to meet my needs. I figured other people might want some examples of how to use it, so this post describes how to set it up and you can see my config file at the end.

Why Monit

My use-case is one server monitoring itself. The obvious question is "who monitors the monitor?", but my main concern is not noticing services I don't use. If the entire server is down I'll probably notice eventually. A bigger problem is that I'm not testing the firewall rules and routing. A separate server really would be ideal, but since I can't run it from home (ISP port filtering blocks SMTP), I'd have to pay for another VPS, and it doesn't seem worth it right now.

I've worked with sysadmins in the past who liked Nagios, so that was my first choice for this, but it's complicated to set up and extreme overkill for monitoring one server. I looked at Sensu too, and it seems nicer but still overkill. I chose Monit because it's easy to setup (see next) and the barebones UI doesn't matter to me (I just want a basic up/down status and emails).

Setup

To install Monit on Fedora, you run:

sudo dnf install monit
# edit config files
sudo monit -t # check config syntax
sudo systemctl start monit
sudo systemctl enable monit

Configuration

The edit config files step is by-far the longest. First you'll want to edit /etc/monitrc and set your mail server and who to sent alerts to. For me this was just:

set mailserver localhost
set alert self@brendanlong.com

If you're using a remote mail server, you probably need to configure a user name and password. I also uncommented the eventqueue lines so alerts won't be lost if the mail server goes down.

The only other thing I changed in this file is the set httpd port ... section, where I changed the admin password and removed the localhost restriction (so I can access it remotely at http://status.brendanlong.com).

I put the rest of my configuration in individual files in /etc/monit.d. For example, monitoring of things accessed at "brendanlong.com" is in a file named /etc/monit.d/brendanlong.com and http://etherealspring.com/ is in /etc/monit.d/etherealspring.com. This is just personal preference, but it will be easier to handle package updates this way, and I should be able to find things faster.

To write these config files, refer to the Monit documentation. Monit can check a lot of things like processes and system health, but I'm a believer in checking the thing you actually care about. I don't care what processes are running or what files exist, I just want to sure that you can get the correct pages from each HTTP server and the other servers are responding in reasonable ways. To do that, I used check host rules exclusively.

Host rules take the form:

check host [unique name] with address [actual domain name]
    if failed
        [rules]
    then [alert / restart / etc.]

The unique name part is annoying, since Monit creates a rule for your server automatically, and you can't add to it (as far as I can tell — email me if this isn't true). I got around this by naming it "localhost" instead of "brendanlong.com".

The part I had the most trouble with was figuring out the rules. Here's what I found:

  • You'll always want a port = [num] rule. This works how you'd expect.
  • If you're testing one of the supported protocols, add a protocol section. It supports all of the major protocols like HTTP(S), SMTP(S), IMAP(S), etc. If you tell it the protocol, it will ensure that the endpoint not only connects, but gives a reasonable response.
  • For protocols that have logins, you can give a username and password and it will test if the login succeeds.
  • For HTTP, you can give the expected status and text that should be in the response with content.
  • Order matters. For example, protocol http status 200 content = "Brendan Long" is valid, but protocol http content = "Brendan Long" status 200 is not. See the syntax in the documentation for the correct order.
  • If you don't set protocol, it will just test if a TCP connection succeeds. You can do more complicated checks with send and expect (send a text or binary message and check the response).
  • You probably want to setfault tolerance for some rules. In my case, the connection to my SMTP server would randomly fail, but it doesn't really matter as long as a retry works. I made it quieter by adding for 3 cycles to that rule.

Examples

Here's the config file for brendanlong.com. At some point I'll make it do more extensive testing for Minecraft and SyncThing, but this gets me 90% of what I wanted:

check host localhost with address brendanlong.com
    if failed
        port 22
        protocol ssh
    then alert

    if failed
        port 443
        protocol https
        status = 301
    then alert

    if failed
        port 80
        protocol http
        status = 301
    then alert

    if failed
        port 25
        protocol smtp
        for 3 cycles
    then alert

    if failed
        port 465
        protocol smtps
    then alert

    if failed
        port 143
        protocol imap
    then alert

    if failed
        port 993
        protocol imaps
    then alert

    # Minecraft
    if failed
        port 25565
    then alert

    # SyncThing
    if failed
        port 22000
    then alert

check host www.brendanlong.com with address www.brendanlong.com
    if failed
        port 443
        protocol https
        status = 200
        content = "Brendan Long"
    then alert
    if failed
        port 80
        protocol http
        status = 301
    then alert

check host wiki.brendanlong.com with address wiki.brendanlong.com
    if failed
        port 80
        protocol http
        status = 403
    then alert