Configure & Deploy DRBD on Ubuntu 20.04+

by Travis on Nov.21, 2021, under Tech Stuff

So, in an effort to try get some solid replication of my data to a secondary site, I tried a few options but ultimately decided to go with DRBD for simplicity. Below are the steps taken to get this up and running between two systems.

Information:
DRBD supports three distinct forms of replication.

A. Asynchronous replication. Most often used for ‘long distance’ replication. Keep in mind there is always the risk of *some* data loss for uncommitted writes in the event of a forced failover.

B. Memory synchronous replication. This is a semi-synchronous replication protocol. In the event of a forced fail-over no writes are lost, however, if there is a simultaneous failure on both nodes, destruction of the primary data store and the most recent writes completed may be lost.

C. Synchronous replication. Local writes on the primary node are not considered completed until the remote disk has confirmed. Loss of a single node will result in no data loss under any circumstance. Loss of both nodes and their storage simultaneously would result in data loss.

Assumptions:

Server 1 will be called replica01 for its hostname
Server 2 will be called replica02 for its hostname
ip.ip.ip.ip is to be replaced with the IPs of your systems where necessary
Unused mounted disk is /dev/sdc
We’re using Protocol A for Asynchronous replication.

Preparation (actions done on both systems):

1. Create the file /etc/multipath/conf.d/drbd.conf and place the following in its contents:

blacklist {
devnode “^drbd[0-9]+”
}

2. Install drbd:

sudo apt-get install drbd-utils
sudo depmod -a

3. Modify the contents of your /etc/hosts file and add the following:

ip.ip.ip.ip replica01
ip.ip.ip.ip replica02

4. Create a new replica config in /etc/drbd.d called r0.res with the contents as outlined below. Important things to note here is that the disk device that is specified is the disk outlined in assumptions. Also make sure to change your shared-secret.

resource r0 {
protocol A;
startup {
wfc-timeout 15;
degr-wfc-timeout 60;
}
net {
cram-hmac-alg sha1;
shared-secret “secret”;
}
device /dev/drbd0;
disk /dev/sdc;
meta-disk internal;
on replica01 {
address ip.ip.ip.ip:7788;
}
on replica02 {
address ip.ip.ip.ip:7788;
}
}

5. Initialize the meta data storage.

sudo drbdadm create-md r0
sudo service drbd start

6. On the primary host ONLY.

sudo drbdadm — –overwrite-data-of-peer primary all
sudo mkfs.ext3 /dev/drbd0
sudo mkdir /drbd-vol
sudo mount /dev/drbd0 /drbd-vol

Testing:
To verify the system is working as expected. we’re going to run a test.

1. Let’s create a file on our new volume and fill it with some kind of data. Feel free to copy other things there if you’d like too. Just nothing too large right now.

ls -lahR /etc >> /drbd-vol/output.txt

2. First, we’ll unmount from the primary node (replica01) and demote it to secondary.

sudo umount /drbd-vol
sudo drbdadm secondary r0

3. Now, we’ll promote the secondary node (replica02) to primary and mount the volume there.

sudo drbdadm primary r0
sudo mount /dev/drbd0 /drbd-vol

4. Confirm your previously created file(s) are there and that the contents are present. You should see the output.txt file and any other contents you copied.

ls /drbd-vol/
cat /drbd-vol/output.txt

5. Reverse the previous process from replica02 back to replica01 to restore the original configuration.

Other Info

You can take a look at the status of replication at any point by running the following command:

watch -n1 cat /proc/drbd

The second line of the output (sample below) has the following counters and informational data.

version: 8.4.11 (api:1/proto:86-101)
srcversion: FC3433D849E3B88C1E7B55C
0: cs:Connected ro:Secondary/Primary ds:UpToDate/UpToDate A r—–
ns:224 nr:18515180 dw:155350536 dr:2801 al:12 bm:0 lo:0 pe:0 ua:0 ap:0 ep:1 wo:f oos:0

Meaning of each field:

ns (network send) Total data sent to peer (KB)
nr (network receive) Total data received by peer (KB)
dw (disk write) Data written to the local disk (KB)
dr (disk read) Data read from the local disk (KB)
al (activity log) Number of updates to the activity log of the meta data.
bm (bit map) Number of updates to the bitmap of the meta data.
lo (local count) Number of open requests by local subsystem issued by DRBD.
pe (pending) Number of requests sent to the primary peer that have not yet been answered.
ua (unacknowledged) Number of requests received by the peer that have not yet been answered.
ap (application pending) Number of block IO requests sent to DRBD, but not yet answered by DRBD.
ep (epochs) Number of epoch objects which is usually 1. can increase under IO load when using either the barrier or the none write ordering method.
wo (write order) Current write ordering method: b(barrier), f(flush), d(drain) or n(none).
oos (out of sync) Amount of storage currently out of sync (KB)