MD arrays are great for data integrity, but if you ever actually utilize this attribute, you may be in for a very long array re-synchronization. Unless, that is, you have a write-intent bitmap configured. If you do, then the system can quickly find the few places that may be out of sync and fix them. This is particularly the case for RAID 5 or 6.
mdadm lets you add the write-intent bitmap
as either internal or external
as a file on an ext filesystem.
Performace tests show that external bitmaps can be up to 5x faster
(random write IOPS),
especially if written to SSD or NVME storage.
To assemble an MD array with an external bitmap, the filesystem containing the bitmap must already be mounted read-write. Presumably, this is the root filesystem, but needn't be. Back in the old days of carefully-ordered init scripts, this worked fine. Now, not so much.
Enter systemd, which makes the world
more dynamic and asynchronous.
Systemd forced us to have /usr mounted in order for
user-space startup to proceed.
So, now, we build a kernel with a companion initramfs
(initrd) with all the bootstrap stuff from
/ and /usr and whatever else is needed
to get the system going.
Things may vary, but on Debian bullseye, boot proceeds like this:
The kernel loads the initramfs and mounts it at /
before turning over control to the /init script.
The initrd instance of systemd-udevd
wants to activate all the block devices
(including MD arrays, LVM logical volumes, USB sticks, etc.)
before the init script tries to mount local filesystems.
In the specific case of MD, there is a udev rule that
runs mdadm --incremental on every component device.
This is the chicken-and-egg problem:
MD needs to access the bitmap on a mounted filesystem,
or else it will fail and leave the array 'inactive'.
But, no filesystem will be mounted until MD assembly is attempted.
Eventually, the initramfs sequence will mount the root
filesystem, but it will be at /root which makes the
path to the bitmap different than what's specified in
/etc/mdadm/mdadm.conf.
In many cases, the MD array is not actually needed for early startup.
A simple solution can be found at
LabFruits.
The idea is to get the initrd to completely ignore the MD array.
It will be handled by systemd once
/ and /usr are mounted
in their final places and the ramdisk environment is gone.
Systemd provides many tools for managing dependencies and
is tightly integrated with udev.
Note that putting AUTO -all into
/etc/mdadm/mdadm.conf did not have the desired effect.
Here is a more conservative solution than the LabFruits way.
Install the following script as
/etc/initramfs-tools/hooks/mdadm-nuke.
#!/bin/sh
# delay md assembly until root is mounted
# so that external bitmap files are accessible
PREREQ="mdadm"
prereqs() {
echo "$PREREQ"
}
case "${1:-}" in
prereqs)
prereqs
exit 0
;;
esac
. /usr/share/initramfs-tools/hook-functions
# undo part of what /usr/share/initramfs-tools/hooks/mdadm does
for UDEV_RULE in 63-md-raid-arrays.rules 64-md-raid-assembly.rules; do
rm -f $DESTDIR/lib/udev/rules.d/$UDEV_RULE
done
Then run update-initramfs -u -k kernel_version in
order to regenerate the ramdisk.
You can verify the contents via lsinitramfs.
It's best to keep at least one good kernel and/or initrd.img
around for rescue booting.
With this new setup, udev in the initramfs
is oblivious to MD arrays and mdadm.
Then, as soon as the real systemd on the real root takes over,
the udev rules for MD arrays spring back to life.
The MD arrays are assembled, LVs are discovered,
and filesystems are mounted and checked.
It's almost like magic.
What remains is to make sure that services such as
mailservers, webservers, and databases don't start
before the filesystems they need are mounted.
An easy way to do this is to run, for example:
systemctl edit apache2.service
and then enter:
[Unit] RequiresMountsFor=/var/www
The end result is to create a directory and file like this:
/etc/systemd/system/apache2.service.d/override.conf,
but you're free to create the directory yourself and
have .conf files with whatever names you like.
The good news is that all of this can be achieved via changes in
/etc.
No package-provided files need to be changed.