Bullet-Proofing your Znode with monit

What do we want? Bullet-proof Znodes!

When do we want it? Now!

“What do you even mean by bullet-proof?” Glad you asked!

In this tutorial we’re gonna secure your zcoind in such a way that it will automatically bounce back from failures to keep your place in that ever-growing queue. Sound good? Alright, roll up your sleeves and follow me to the engine room! (Mind your head)

zcoind‘s little helper here will be monit, “a utility for managing and monitoring processes, programs, files, directories and filesystems on a Unix system“.
Sounds exciting? Well, you don’t have to love it.

First we need to install monit. apt install monit will do that for you. Easy-peasy.

You could run monit now, but it just wouldn’t know what to do. Before we actually start giving instructions, we need to know a few things. For one, we should know the path of our zcoin bin directory. The default is ~/zcoin-0.13.4/bin/. Next we need to take note of the username for the user we’re working with. This should be the non-root user you created during setup (if you followed the official setup guide) which is also running your zcoind service.

monit’s config file is located at /etc/monit/monitrc but it’s big and bloated and contains some lines at the end which let you create easier to read configs in another directory so we’re gonna do that.

Create a zcoind config file in that directory like so sudo vi /etc/monit/conf.d/zcoind.conf. The name doesn’t matter at all. Call it “Henry” if you like. As usual, press the i key to insert text into the file. And text you will enter.

Namely this (each numbered line goes on one line in the file, though you can have the as uid ... statements on separate lines resulting in 7 lines total)

    1. check process zcoind with pidfile /home/username/.zcoin/zcoind.pid
    2. start program = "/home/username/zcoin-0.13.4/bin/zcoind -conf=/home/username/.zcoin/zcoin.conf -datadir=/home/username/.zcoin/"
      as uid username and gid username
    3. stop program = "/home/username/zcoin-0.13.4/bin/zcoin-cli stop"
      as uid username and gid username
    4. if failed port 8168 then restart
    5. if 5 restarts within 5 cycles then unmonitor

Let’s break that down:

  1. Here we tell monit to check a process which we call zcoind. You could also call it “goldilocks” here – doesn’t matter. What matters though is the PID file. A PID file is a file that is created by a process and contains its process ID (or PID). When all goes well, the existence of a PID file shows us that the process is running. Here monit uses the file to identify the process. Edit the path to reflect your username/path to PID file
  2. This is the start command. Replace username/path. We’re being very explicit here (like pointing to the exact conf file locations etc.) but nothing else worked for me. The uid (user id) and gid (group id) bits tell monit which user to run this command as. Edit this to reflect your username and group name (by default every user has a group with its name so they can be the same here)
  3. The stop command. Edit the path and user/group name to fit your system
  4. monit will try to connect to the Znode port (8168) to determine if zcoind went down. So if the socket on port 8168 (monit will verify the port accepts connections and that it’s possible to write/read to/from the socket) failedthen it will restart the process
  5. If zcoind acts all crazy (like restart five times in five cycles), monit will declare it unmonitorable (that’s a word, right?). This line isn’t really needed but it prevents monit from constantly trying what is futile and filling up the logs. You can play with the values in this line. “But what’s a cycle?! you may ask. We’ll get to that.

You haven’t copied the numbers that the beginning of each line, have you? No? Good! Save the file by hitting Esc and typing :wq! – Enter. Done with this part.

Now let’s see if you screwed it up. monit comes with an inbuilt syntax checker and running monit -t should only return Control file syntax OK and an obscure Include failed error which we will ignore for now.

Let’s head to the config real quick, I wanna show you something. sudo vi /etc/monit/monitrc

The first line that actually does something is set daemon 120. That’s the cycle length I mentioned earlier. By default monit will run its checks every 120 seconds. You can tweak this to a time of your liking but 120 worked fine for me and there’s no benefit of having 60 seconds of downtime over 120 seconds here.

If you want to use commands like monit status later, you will have to uncomment the following section

# set httpd port 2812 and
# use address localhost # only accept connection from localhost
# allow localhost # allow localhost to connect to the server and
# allow admin:monit # require user 'admin' with password 'monit'

To uncomment in geekspeak means to remove characters that make something a comment. So remove the hashes (#) at the beginning of each line (but not the other ones, OK?). This activates a little http server that is needed to run these commands. In this setting it cannot be accessed from the outside, though you can do that but I won’t do that here. The username and password in the last line are not needed for local operation.

Now, while we’re at it, let’s get rid of the that obscure include error, shall we?
Scroll all the way to the bottom of the file and comment (i.e. add a hash) to the very last line (reading include /etc/monit/conf-enabled/*)

Run the check again (monit -t) and look at the beautiful, clean output – unless you screwed something up that is.

More than 800 words into the text and monit isn’t even running. Let’s make some headway here and start it: sudo systemctl start monit.service. On his first day of work, we’ll ask him to read his instructions just to make sure he got the memo (sudo monit reload) before we ask him how he’s doing sudo monit status. If the status command indicates that zcoind is not monitored, type sudo monit start all.

If everything looks good there, the last thing we should do is to make sure monit starts with the system at the next reboot. Type sudo update-rc.d monit defaults and you’re good.

If you feel super courageous now, killall zcoind and see if it magically gets restarted.

To follow the action live, watch your log with tail -f which shows new lines in a file as they appear. So with sudo tail -f /var/log/monit.log you can follow monit do its job as it does its job. And that will hopefully look like this:

[CET Jan 5 18:55:40] error : 'zcoind' process is not running
[CET Jan 5 18:55:40] info : 'zcoind' trying to restart
[CET Jan 5 18:55:40] info : 'zcoind' start: /home/username/zcoin-0.13.4/bin/zcoind
[CET Jan 5 18:57:41] info : 'zcoind' process is running with pid 5636

The two-minute gap between the start of zcoind and the info that it is running is due to my cycle length (set by set daemon as described above) set to 120 seconds.

So there you go! You have installed, configured, and tested (!) your monit installation. High five, you sysadmin!

Addendum

There is one issue with this approach. With the next bigger update, the path of the zcoin bin directory (/zcoin-0.13.4/bin/) will probably change to/zcoin-0.13.5/bin. Make sure you update your monit config accordingly or maybe use the occasion to switch to a more generic foldername like zcoin altogether.

Advertisements

14 thoughts on “Bullet-Proofing your Znode with monit

  1. Thank you for the writeup!

    I have an issue however.

    It will not start zcoind, since it is permission denied.

    However, I’ve set the UID/GID the same as the ID of the user.

    This is the monit log:

    [UTC Jan 8 20:00:09] info : ‘zcoind’ start: /home/USER/zcoin/src/zcoind
    [UTC Jan 8 20:00:40] error : ‘zcoind’ failed to start (exit status 6) — /home/USER/zcoin/src/zcoind:

    ************************
    EXCEPTION: N5boost10filesystem16filesystem_errorE
    boost::filesystem::create_directory: Permission denied: “/.zcoin”
    zcoin in AppInit()

    Like

  2. monit -t shows clean syntax. I can run the start program command (zcoind) from the zcoind.conf from the command-line and it works. But monit totally fails to run zcoind. Here’s what it shows in logs 5 times before it fails and unmonitors:

    ‘zcoind’ start: /home/myhome/zcoin-0.13.4/bin/zcoind
    ‘zcoind’ failed to start (exit status 1) — /home/myhome/zcoin-0.13.4/bin/zcoind: Error: There is no RPC client functionality in zcoind anymore. Use the zcoind-cli utility instead.

    Like

    1. Hey, it looks like the monit syntax is fine but zcoind doesn’t like the start command. Did you maybe put double dashes behind zcoind accidentally? Can I see your start command please?

      Like

      1. Thanks for the reply (and for the article), Sebastian. =)

        My VPS uses noVNC which doesn’t seem to support copy/paste (or the pipe symbol), so I’ll try to transcribe my start command (changing the home user / directory):

        start program = “/home/myhome/zcoin-0.13.4/bin/zcoind -conf=/home/myhome/.zcoin/zcoin.conf -datadir=/home/myhome/.zcoin/”
        as myuser and mygroup

        I have tried adding or removing -daemon as the first parameter (no change). I have also tried having the “as” clause on the same line or on a separate line (no change).

        Like

      2. I think I see where I went wrong: the gid and uid are in the monit zcoind.conf are not just placeholders but text that needs to be there in addition to the username.

        I’ve added these and restarted everything and it looks like it’s working (status = Monitored) now. Thanks!

        Like

      3. Ha! Yes they need to be there. I’ve been trying to use code blocks like to indicate what needs to be changed and what not but WordPress wasn’t having it. Still looking for a solution there.
        Happy to hear it works!

        Like

  3. I think you have a typo:

    stop program = “/home/username/zcoin-0.13.4/bin/zcoin-cli -stop”
    as uid username and gid username

    should be

    stop program = “/home/username/zcoin-0.13.4/bin/zcoin-cli stop”
    as uid username and gid username

    stop is a command to zcoin-cli, not a flag.

    Like

  4. Hi – firstly, great content!

    I’ve fun thru the steps and get an ‘execution failed’ when checking the status…

    Process ‘zcoind’
    status Execution failed
    monitoring status Monitored
    data collected Mon, 22 Jan 2018 03:09:28

    Any advice?

    Like

    1. Thanks! And look at that Freudian slip – “fun thru”, huh? That’s amazing 😉

      Can I see your config and log file please? You can also find me on Discord (catchingknives) and Telegram (@sebchat) if that’s more convenient for you

      Like

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s