MD arrays are great for data integrity, but if you ever actually utilize this attribute, you may be in for a very long array re-synchronization. Unless, that is, you have a write-intent bitmap configured. If you do, then the system can quickly find the few places that may be out of sync and fix them. This is particularly the case for RAID 5 or 6.
mdadm
lets you add the write-intent bitmap
as either internal
or external
as a file on an ext
filesystem.
Performace tests show that external bitmaps can be up to 5x faster
(random write IOPS),
especially if written to SSD or NVME storage.
To assemble an MD array with an external bitmap, the filesystem containing the bitmap must already be mounted read-write. Presumably, this is the root filesystem, but needn't be. Back in the old days of carefully-ordered init scripts, this worked fine. Now, not so much.
Enter systemd
, which makes the world
more dynamic and asynchronous.
Systemd forced us to have /usr
mounted in order for
user-space startup to proceed.
So, now, we build a kernel with a companion initramfs
(initrd
) with all the bootstrap stuff from
/
and /usr
and whatever else is needed
to get the system going.
Things may vary, but on Debian bullseye, boot proceeds like this:
The kernel loads the initramfs
and mounts it at /
before turning over control to the /init
script.
The initrd
instance of systemd-udevd
wants to activate all the block devices
(including MD arrays, LVM logical volumes, USB sticks, etc.)
before the init script tries to mount local filesystems.
In the specific case of MD, there is a udev
rule that
runs mdadm --incremental
on every component device.
This is the chicken-and-egg problem:
MD needs to access the bitmap on a mounted filesystem,
or else it will fail and leave the array 'inactive'.
But, no filesystem will be mounted until MD assembly is attempted.
Eventually, the initramfs
sequence will mount the root
filesystem, but it will be at /root
which makes the
path to the bitmap different than what's specified in
/etc/mdadm/mdadm.conf
.
In many cases, the MD array is not actually needed for early startup.
A simple solution can be found at
LabFruits.
The idea is to get the initrd
to completely ignore the MD array.
It will be handled by systemd
once
/
and /usr
are mounted
in their final places and the ramdisk environment is gone.
Systemd provides many tools for managing dependencies and
is tightly integrated with udev
.
Note that putting AUTO -all
into
/etc/mdadm/mdadm.conf
did not have the desired effect.
Here is a more conservative solution than the LabFruits way.
Install the following script as
/etc/initramfs-tools/hooks/mdadm-nuke
.
#!/bin/sh # delay md assembly until root is mounted # so that external bitmap files are accessible PREREQ="mdadm" prereqs() { echo "$PREREQ" } case "${1:-}" in prereqs) prereqs exit 0 ;; esac . /usr/share/initramfs-tools/hook-functions # undo part of what /usr/share/initramfs-tools/hooks/mdadm does for UDEV_RULE in 63-md-raid-arrays.rules 64-md-raid-assembly.rules; do rm -f $DESTDIR/lib/udev/rules.d/$UDEV_RULE done
Then run update-initramfs -u -k kernel_version
in
order to regenerate the ramdisk.
You can verify the contents via lsinitramfs
.
It's best to keep at least one good kernel and/or initrd.img
around for rescue booting.
With this new setup, udev
in the initramfs
is oblivious to MD arrays and mdadm
.
Then, as soon as the real systemd
on the real root takes over,
the udev
rules for MD arrays spring back to life.
The MD arrays are assembled, LVs are discovered,
and filesystems are mounted and checked.
It's almost like magic.
What remains is to make sure that services such as
mailservers, webservers, and databases don't start
before the filesystems they need are mounted.
An easy way to do this is to run, for example:
systemctl edit apache2.service
and then enter:
[Unit] RequiresMountsFor=/var/www
The end result is to create a directory and file like this:
/etc/systemd/system/apache2.service.d/override.conf
,
but you're free to create the directory yourself and
have .conf
files with whatever names you like.
The good news is that all of this can be achieved via changes in
/etc
.
No package-provided files need to be changed.