DEV Community

Jack Moore
Jack Moore

Posted on • Originally published at jmoore53.com on

Clustered Storage with DRBD

Logical Volumes

Create an LVM on each server for the NFS Mount

On both servers:

lvcreate -n nfspoint -V 550G pve/data
mkfs.ext4 /dev/pve/nfspoint
echo '/dev/pve/nfspoint /var/lib/nfspoint ext4 defaults 0 2' >> /etc/fstab
mkdir /var/lib/nfspoint
mount /dev/pve/nfspoint /var/lib/nfspoint/
lvs

Enter fullscreen mode Exit fullscreen mode

vim /etc/exports

/var/lib/nfspoint 10.0.0.0/255.0.0.0(rw,no_root_squash,no_all_squash,sync)

Enter fullscreen mode Exit fullscreen mode

DRBD

DRBD is a little tricky.. Because LVM was thin provisioned it looked like some metadata was written to teh device /dev/pve/nfspoint. This means I had to zero out the nfspoint.

DRBD Configuration:

global { usage-count no; }
common { syncer { rate 100M; } }
resource r0 {
        protocol C;
        device /dev/drbd0 minor 0;
        startup {
        wfc-timeout 120;
            degr-wfc-timeout 60;
            become-primary-on both;
        }
        net {
            cram-hmac-alg sha1;
            allow-two-primaries;
            shared-secret "secret";
        }
        on HLPMX1 {
            disk /dev/pve/nfspoint;
            address 10.0.0.2:7788;
            meta-disk internal;
        }
        on HLPMX2 {
            disk /dev/pve/nfspoint;
            address 10.0.0.3:7788;
            meta-disk internal;
        }
}

Enter fullscreen mode Exit fullscreen mode

This wasn’t a problem because there was no data. This may be a problem when I expand the volume with LVM or need to make any kind of resource changes to the device.

dd if=/dev/zero of=/dev/pve/nfspoint bs=1M count=200


mkfs.ext4 –b 4096 /dev/drbd0


curl --output drbd9.15.tar.gz https://launchpad.net/ubuntu/+archive/primary/+sourcefiles/drbd-utils/9.15.0-1/drbd-utils_9.15.0.orig.tar.gz
tar -xf drbd9.15.tar.gz
cd drbd9.15
./configure --prefix=/usr --localstatedir=/var --sysconfdir=/etc
make all
make install

Enter fullscreen mode Exit fullscreen mode

Node 2: (because gcc wasn’t installed)

apt install build-essential
apt install gcc
apt install flex


# Copy config to other server 
sudo drbdadm create-md r0
sudo systemctl start drbd.service
sudo drbdadm -- --overwrite-data-of-peer primary all
mkfs.ext4 /dev/drbd0
mkdir /srv/nfspoint
sudo mount /dev/drbd0 /srv/nfspoint

Enter fullscreen mode Exit fullscreen mode

Splitbrain

On Split Brain Victim

drbdadm disconnect r0
drbdadm secondary r0
drbdadm connect --discard-my-data r0

Enter fullscreen mode Exit fullscreen mode

On Split Brain Survivor

drbdadm primary r0
drbdadm connect r0

Enter fullscreen mode Exit fullscreen mode

Inverting Resourcing

On Current Primary:

umount /srv/nfspoint
drbdadm secondary r0

Enter fullscreen mode Exit fullscreen mode

On Secondary:

drbdadm primary r0
mount /dev/drbd0 /srv/nfspoint

Enter fullscreen mode Exit fullscreen mode

DRDB Broken?

Reboot both servers

systemctl start drbd.service
drbdadm status

umount /dev/pve/nfspoint
mkfs.ext4 -b 4096 /dev/pve/nfspoint

dd if=/dev/zero of=/dev/drbd0 status=progress


mkfs -t ext4 /dev/drbd0

Enter fullscreen mode Exit fullscreen mode

Troubleshooting DRBD Issues

Odd Mountpoint with Loop Device

What should appear off lsblk:

NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINT
sda 8:0 0 1.7T 0 disk
├─sda1 8:1 0 1007K 0 part
├─sda2 8:2 0 512M 0 part
└─sda3 8:3 0 1.7T 0 part
  ├─pve-swap 253:0 0 8G 0 lvm [SWAP]
  ├─pve-root 253:1 0 96G 0 lvm /
  ├─pve-data_tmeta 253:2 0 15.6G 0 lvm
  │ └─pve-data-tpool 253:4 0 1.5T 0 lvm
  │ ├─pve-data 253:5 0 1.5T 0 lvm
  │ ├─pve-vm--105--disk--0 253:6 0 20G 0 lvm
  │ └─pve-nfspoint 253:7 0 550G 0 lvm
  │ └─drbd0 147:0 0 550G 0 disk /srv/nfspoint
  └─pve-data_tdata 253:3 0 1.5T 0 lvm
    └─pve-data-tpool 253:4 0 1.5T 0 lvm
      ├─pve-data 253:5 0 1.5T 0 lvm
      ├─pve-vm--105--disk--0 253:6 0 20G 0 lvm
      └─pve-nfspoint 253:7 0 550G 0 lvm
        └─drbd0 147:0 0 550G 0 disk /srv/nfspoint

Enter fullscreen mode Exit fullscreen mode

What was incorrectly appearing off lsblk:

loop0 7:0 0 200M 0 loop /srv/nfspoint
sda 8:0 0 1.7T 0 disk
├─sda1 8:1 0 1007K 0 part
├─sda2 8:2 0 512M 0 part
└─sda3 8:3 0 1.7T 0 part
  ├─pve-swap 253:0 0 8G 0 lvm [SWAP]
  ├─pve-root 253:1 0 96G 0 lvm /
  ├─pve-data_tmeta 253:2 0 15.6G 0 lvm
  │ └─pve-data-tpool 253:4 0 1.5T 0 lvm
  │ ├─pve-data 253:5 0 1.5T 0 lvm
  │ ├─pve-vm--100--disk--0 253:6 0 30G 0 lvm
  │ ├─pve-vm--102--disk--0 253:7 0 100G 0 lvm
  │ ├─pve-vm--103--disk--0 253:8 0 20G 0 lvm
  │ ├─pve-vm--104--disk--0 253:9 0 20G 0 lvm
  │ ├─pve-vm--101--disk--0 253:10 0 20G 0 lvm
  │ └─pve-nfspoint 253:11 0 550G 0 lvm
  │ └─drbd0 147:0 0 550G 0 disk
  └─pve-data_tdata 253:3 0 1.5T 0 lvm
    └─pve-data-tpool 253:4 0 1.5T 0 lvm
      ├─pve-data 253:5 0 1.5T 0 lvm
      ├─pve-vm--100--disk--0 253:6 0 30G 0 lvm
      ├─pve-vm--102--disk--0 253:7 0 100G 0 lvm
      ├─pve-vm--103--disk--0 253:8 0 20G 0 lvm
      ├─pve-vm--104--disk--0 253:9 0 20G 0 lvm
      ├─pve-vm--101--disk--0 253:10 0 20G 0 lvm
      └─pve-nfspoint 253:11 0 550G 0 lvm
        └─drbd0 147:0 0 550G 0 disk

Enter fullscreen mode Exit fullscreen mode

If the server is creating a loop device and mounting that as the mount point, try rebooting the server, restarting the drbd service, and clearing out the device on the server having the issue and resyncing it with the primary.

I still dont know if it was split brain or what, but these were the commands I ran:

umount /nfs/sharepoint # This was the location of the mountpoint
drbdadm disconnect
drbdadm connect --discard-my-data r0

Enter fullscreen mode Exit fullscreen mode

I honestly think rebooting the server in question fixed this issue. I think it was something about the /dev/drbd0 device that wasn’t working properly or created properly.

Links

Links 2

https://www.howtoforge.com/high\_availability\_nfs\_drbd\_heartbeat\_p2 https://pve.proxmox.com/wiki/Logical\_Volume\_Manager\_(LVM) https://linux.die.net/man/8/lvremove https://access.redhat.com/documentation/en-us/red\_hat\_enterprise\_linux/6/html/logical\_volume\_manager\_administration/lv\_remove https://serverfault.com/questions/266697/cant-remove-open-logical-volume

Links

Bug Reports

Less useful links

Top comments (0)