What do we want? Bullet-proof Znodes!
When do we want it? Now!
“What do you even mean by bullet-proof?” Glad you asked!
In this tutorial we’re gonna secure your
zcoind in such a way that it will automatically bounce back from failures to keep your place in that ever-growing queue. Sound good? Alright, roll up your sleeves and follow me to the engine room! (Mind your head)
zcoind‘s little helper here will be monit, “a utility for managing and monitoring processes, programs, files, directories and filesystems on a Unix system“.
Sounds exciting? Well, you don’t have to love it.
First we need to install monit.
apt install monit will do that for you. Easy-peasy.
You could run monit now, but it just wouldn’t know what to do. Before we actually start giving instructions, we need to know a few things. For one, we should know the path of our zcoin bin directory. The default is
~/zcoin-0.13.4/bin/. Next we need to take note of the username for the user we’re working with. This should be the non-root user you created during setup (if you followed the official setup guide) which is also running your
monit’s config file is located at
/etc/monit/monitrc but it’s big and bloated and contains some lines at the end which let you create easier to read configs in another directory so we’re gonna do that.
Create a zcoind config file in that directory like so
sudo vi /etc/monit/conf.d/zcoind.conf. The name doesn’t matter at all. Call it “Henry” if you like. As usual, press the i key to insert text into the file. And text you will enter.
Namely this (each numbered line goes on one line in the file, though you can have the
as uid ... statements on separate lines resulting in 7 lines total)
check process zcoind with pidfile /home/username/.zcoin/zcoind.pid
start program = "/home/username/zcoin-0.13.4/bin/zcoind -conf=/home/username/.zcoin/zcoin.conf -datadir=/home/username/.zcoin/"
as uid username and gid username
stop program = "/home/username/zcoin-0.13.4/bin/zcoin-cli stop"
as uid username and gid username
if failed port 8168 then restart
if 5 restarts within 5 cycles then unmonitor
Let’s break that down:
- Here we tell monit to
processwhich we call
zcoind. You could also call it “goldilocks” here – doesn’t matter. What matters though is the PID file. A PID file is a file that is created by a process and contains its process ID (or PID). When all goes well, the existence of a PID file shows us that the process is running. Here monit uses the file to identify the process. Edit the path to reflect your username/path to PID file
- This is the start command. Replace username/path. We’re being very explicit here (like pointing to the exact conf file locations etc.) but nothing else worked for me. The
uid(user id) and
gid(group id) bits tell monit which user to run this command as. Edit this to reflect your username and group name (by default every user has a group with its name so they can be the same here)
- The stop command. Edit the path and user/group name to fit your system
- monit will try to connect to the Znode port (8168) to determine if
zcoindwent down. So
ifthe socket on
port 8168(monit will verify the port accepts connections and that it’s possible to write/read to/from the socket)
zcoindacts all crazy (like restart five times in five cycles), monit will declare it
unmonitorable (that’s a word, right?). This line isn’t really needed but it prevents monit from constantly trying what is futile and filling up the logs. You can play with the values in this line. “But what’s a cycle?! you may ask. We’ll get to that.
You haven’t copied the numbers that the beginning of each line, have you? No? Good! Save the file by hitting Esc and typing
:wq! – Enter. Done with this part.
Now let’s see if you screwed it up. monit comes with an inbuilt syntax checker and running
monit -t should only return
Control file syntax OK and an obscure
Include failed error which we will ignore for now.
Let’s head to the config real quick, I wanna show you something.
sudo vi /etc/monit/monitrc
The first line that actually does something is
set daemon 120. That’s the cycle length I mentioned earlier. By default monit will run its checks every 120 seconds. You can tweak this to a time of your liking but 120 worked fine for me and there’s no benefit of having 60 seconds of downtime over 120 seconds here.
If you want to use commands like
monit status later, you will have to uncomment the following section
# set httpd port 2812 and
# use address localhost # only accept connection from localhost
# allow localhost # allow localhost to connect to the server and
# allow admin:monit # require user 'admin' with password 'monit'
To uncomment in geekspeak means to remove characters that make something a comment. So remove the hashes (#) at the beginning of each line (but not the other ones, OK?). This activates a little http server that is needed to run these commands. In this setting it cannot be accessed from the outside, though you can do that but I won’t do that here. The username and password in the last line are not needed for local operation.
Now, while we’re at it, let’s get rid of the that obscure include error, shall we?
Scroll all the way to the bottom of the file and comment (i.e. add a hash) to the very last line (reading
Run the check again (
monit -t) and look at the beautiful, clean output – unless you screwed something up that is.
More than 800 words into the text and monit isn’t even running. Let’s make some headway here and start it:
sudo systemctl start monit.service. On his first day of work, we’ll ask him to read his instructions just to make sure he got the memo (
sudo monit reload) before we ask him how he’s doing
sudo monit status. If the status command indicates that zcoind is not monitored, type
sudo monit start all.
If everything looks good there, the last thing we should do is to make sure monit starts with the system at the next reboot. Type
sudo update-rc.d monit defaults and you’re good.
If you feel super courageous now,
killall zcoind and see if it magically gets restarted.
To follow the action live, watch your log with
tail -f which shows new lines in a file as they appear. So with
sudo tail -f /var/log/monit.log you can follow monit do its job as it does its job. And that will hopefully look like this:
[CET Jan 5 18:55:40] error : 'zcoind' process is not running
[CET Jan 5 18:55:40] info : 'zcoind' trying to restart
[CET Jan 5 18:55:40] info : 'zcoind' start: /home/username/zcoin-0.13.4/bin/zcoind
[CET Jan 5 18:57:41] info : 'zcoind' process is running with pid 5636
The two-minute gap between the start of zcoind and the info that it is running is due to my cycle length (set by
set daemon as described above) set to 120 seconds.
So there you go! You have installed, configured, and tested (!) your monit installation. High five, you sysadmin!
There is one issue with this approach. With the next bigger update, the path of the zcoin bin directory (
/zcoin-0.13.4/bin/) will probably change to
/zcoin-0.13.5/bin. Make sure you update your monit config accordingly or maybe use the occasion to switch to a more generic foldername like