Jack Moore

Posted on Feb 8, 2021 • Originally published at jmoore53.com on Dec 4, 2020

Clustered Storage with DRBD

#systemconfiguration #sysadmin #homelab #networking

Logical Volumes

Create an LVM on each server for the NFS Mount

On both servers:

lvcreate -n nfspoint -V 550G pve/data
mkfs.ext4 /dev/pve/nfspoint
echo '/dev/pve/nfspoint /var/lib/nfspoint ext4 defaults 0 2' >> /etc/fstab
mkdir /var/lib/nfspoint
mount /dev/pve/nfspoint /var/lib/nfspoint/
lvs

vim /etc/exports

/var/lib/nfspoint 10.0.0.0/255.0.0.0(rw,no_root_squash,no_all_squash,sync)

DRBD

DRBD is a little tricky.. Because LVM was thin provisioned it looked like some metadata was written to teh device /dev/pve/nfspoint. This means I had to zero out the nfspoint.

DRBD Configuration:

global { usage-count no; }
common { syncer { rate 100M; } }
resource r0 {
        protocol C;
        device /dev/drbd0 minor 0;
        startup {
        wfc-timeout 120;
            degr-wfc-timeout 60;
            become-primary-on both;
        }
        net {
            cram-hmac-alg sha1;
            allow-two-primaries;
            shared-secret "secret";
        }
        on HLPMX1 {
            disk /dev/pve/nfspoint;
            address 10.0.0.2:7788;
            meta-disk internal;
        }
        on HLPMX2 {
            disk /dev/pve/nfspoint;
            address 10.0.0.3:7788;
            meta-disk internal;
        }
}

This wasn’t a problem because there was no data. This may be a problem when I expand the volume with LVM or need to make any kind of resource changes to the device.

dd if=/dev/zero of=/dev/pve/nfspoint bs=1M count=200


mkfs.ext4 –b 4096 /dev/drbd0


curl --output drbd9.15.tar.gz https://launchpad.net/ubuntu/+archive/primary/+sourcefiles/drbd-utils/9.15.0-1/drbd-utils_9.15.0.orig.tar.gz
tar -xf drbd9.15.tar.gz
cd drbd9.15
./configure --prefix=/usr --localstatedir=/var --sysconfdir=/etc
make all
make install

Node 2: (because gcc wasn’t installed)

apt install build-essential
apt install gcc
apt install flex


# Copy config to other server 
sudo drbdadm create-md r0
sudo systemctl start drbd.service
sudo drbdadm -- --overwrite-data-of-peer primary all
mkfs.ext4 /dev/drbd0
mkdir /srv/nfspoint
sudo mount /dev/drbd0 /srv/nfspoint

Splitbrain

On Split Brain Victim

drbdadm disconnect r0
drbdadm secondary r0
drbdadm connect --discard-my-data r0

On Split Brain Survivor

drbdadm primary r0
drbdadm connect r0

Inverting Resourcing

On Current Primary:

umount /srv/nfspoint
drbdadm secondary r0

On Secondary:

drbdadm primary r0
mount /dev/drbd0 /srv/nfspoint

DRDB Broken?

Reboot both servers

systemctl start drbd.service
drbdadm status

umount /dev/pve/nfspoint
mkfs.ext4 -b 4096 /dev/pve/nfspoint

dd if=/dev/zero of=/dev/drbd0 status=progress


mkfs -t ext4 /dev/drbd0

Troubleshooting DRBD Issues

Odd Mountpoint with Loop Device

What should appear off lsblk:

NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINT
sda 8:0 0 1.7T 0 disk
├─sda1 8:1 0 1007K 0 part
├─sda2 8:2 0 512M 0 part
└─sda3 8:3 0 1.7T 0 part
  ├─pve-swap 253:0 0 8G 0 lvm [SWAP]
  ├─pve-root 253:1 0 96G 0 lvm /
  ├─pve-data_tmeta 253:2 0 15.6G 0 lvm
  │ └─pve-data-tpool 253:4 0 1.5T 0 lvm
  │ ├─pve-data 253:5 0 1.5T 0 lvm
  │ ├─pve-vm--105--disk--0 253:6 0 20G 0 lvm
  │ └─pve-nfspoint 253:7 0 550G 0 lvm
  │ └─drbd0 147:0 0 550G 0 disk /srv/nfspoint
  └─pve-data_tdata 253:3 0 1.5T 0 lvm
    └─pve-data-tpool 253:4 0 1.5T 0 lvm
      ├─pve-data 253:5 0 1.5T 0 lvm
      ├─pve-vm--105--disk--0 253:6 0 20G 0 lvm
      └─pve-nfspoint 253:7 0 550G 0 lvm
        └─drbd0 147:0 0 550G 0 disk /srv/nfspoint

What was incorrectly appearing off lsblk:

loop0 7:0 0 200M 0 loop /srv/nfspoint
sda 8:0 0 1.7T 0 disk
├─sda1 8:1 0 1007K 0 part
├─sda2 8:2 0 512M 0 part
└─sda3 8:3 0 1.7T 0 part
  ├─pve-swap 253:0 0 8G 0 lvm [SWAP]
  ├─pve-root 253:1 0 96G 0 lvm /
  ├─pve-data_tmeta 253:2 0 15.6G 0 lvm
  │ └─pve-data-tpool 253:4 0 1.5T 0 lvm
  │ ├─pve-data 253:5 0 1.5T 0 lvm
  │ ├─pve-vm--100--disk--0 253:6 0 30G 0 lvm
  │ ├─pve-vm--102--disk--0 253:7 0 100G 0 lvm
  │ ├─pve-vm--103--disk--0 253:8 0 20G 0 lvm
  │ ├─pve-vm--104--disk--0 253:9 0 20G 0 lvm
  │ ├─pve-vm--101--disk--0 253:10 0 20G 0 lvm
  │ └─pve-nfspoint 253:11 0 550G 0 lvm
  │ └─drbd0 147:0 0 550G 0 disk
  └─pve-data_tdata 253:3 0 1.5T 0 lvm
    └─pve-data-tpool 253:4 0 1.5T 0 lvm
      ├─pve-data 253:5 0 1.5T 0 lvm
      ├─pve-vm--100--disk--0 253:6 0 30G 0 lvm
      ├─pve-vm--102--disk--0 253:7 0 100G 0 lvm
      ├─pve-vm--103--disk--0 253:8 0 20G 0 lvm
      ├─pve-vm--104--disk--0 253:9 0 20G 0 lvm
      ├─pve-vm--101--disk--0 253:10 0 20G 0 lvm
      └─pve-nfspoint 253:11 0 550G 0 lvm
        └─drbd0 147:0 0 550G 0 disk

If the server is creating a loop device and mounting that as the mount point, try rebooting the server, restarting the drbd service, and clearing out the device on the server having the issue and resyncing it with the primary.

I still dont know if it was split brain or what, but these were the commands I ran:

umount /nfs/sharepoint # This was the location of the mountpoint
drbdadm disconnect
drbdadm connect --discard-my-data r0

I honestly think rebooting the server in question fixed this issue. I think it was something about the /dev/drbd0 device that wasn’t working properly or created properly.

Links 2

https://www.howtoforge.com/high\_availability\_nfs\_drbd\_heartbeat\_p2 https://pve.proxmox.com/wiki/Logical\_Volume\_Manager\_(LVM) https://linux.die.net/man/8/lvremove https://access.redhat.com/documentation/en-us/red\_hat\_enterprise\_linux/6/html/logical\_volume\_manager\_administration/lv\_remove https://serverfault.com/questions/266697/cant-remove-open-logical-volume

Bug Reports

Bug Report

DEV Community

Clustered Storage with DRBD

Logical Volumes

DRBD

Splitbrain

Inverting Resourcing

DRDB Broken?

Troubleshooting DRBD Issues

Odd Mountpoint with Loop Device

Links

Links 2

Links

Bug Reports

Less useful links

Top comments (0)

Read next

Understanding CSMA/CD and CSMA/CA in Networking

How to Create AWS VPC Using Terraform

Building an Event-Driven Socket Server in Python

BUILDING A CYBERSECURITY DETECTION AND MONITORING LAB BY LEVERAGING LOCAL VIRTUAL MACHINES (VMs) AND MICROSOFT AZURE