This was supposed to be something simple, but it ended up taking more time and energy than it had any right to.
These are the basic steps to get the storage volume created and connected:
- Create the volume on your favorite (or available) SAN device and export it as an iSCSI target
- Configure the iSCSI initiator (client) using any number of instructions out there:
- iscsiadm -m discovery -t st -p <SAN IP address>
- iscsiadm -m node -T <target iqn> -p <SAN IP address> -l
- use /dev/disk/by-path to find the device for the volume
- Create the filesystem on the device
- mkfs.xfs <device from last step>
- Create the mount point
In an ideal world, at this point we would just create a line in /etc/fstab that looks like this:
/dev/<device> /mountpoint xfs defaults 0 0
There are four fundamental problems with this …
- If another iSCSI volume were to be mounted on this system, the drive letters could change
- Since the filesystem type is ‘xfs’, systemd has no clue that it is not a local disk, so it tries to mount it at the very beginning of the boot process — even before the network starts up … in fact, systemd won’t even try to start the network until the local filesystems are up. This race condition will render your system unbootable
- During shutdown, systemd will take down the network before it tries to unmount all local filesystems … again, this will cause a problem since the storage will disappear before the filesystem can be closed properly, leading to a high likelihood of data corruption/loss
- iSCSI volume mounts don’t happen immediately, and an overzealous application could start trying to use a directory on the mounted filesystem before it is ready, resulting in unpredictable errors
Problem #1 is easily solved using a UUID-based mount: get the UUID of the volume using /dev/disk/by-uuid, match the current drive letter with the UUID, and replace that in the /etc/fstab entry:
UUID=<volume UUID> /mountpoint xfs defaults 0 0
Problem #2 (and part of #3) is easily handled by specifically telling systemd that it is a network filesystem using a parameter in /etc/fstab:
UUID= <volume UUID> /mountpoint xfs _netdev 0 0
From a systemd perspective, this creates a dependency for this mount action to the ‘network-online.target’ and ‘remote-fs-pre.target’, and makes the ‘remote-fs.target’ dependent on this mount (this will be significant later).
The rest of problem #3 is a bit trickier; the ‘_netdev’ tells systemd that the drive is a remote filesystem, but it still doesn’t know that it is iSCSI, so during shutdown it will (in parallel with all the other activities) shut down the iSCSI subsystem … before the filesystems have been unmounted. This will result in an indefinite hang when trying to shut down, as well as corrupted filesystems. Telling systemd that the filesystem is on an iSCSI volume requires another parameter in the /etc/fstab entry:
UUID= <volume UUID> /mountpoint xfs _netdev,x-systemd.requires=iscsi.service 0 0
This now will guarantee that things will happen in the right order during startup (network -> iSCSI -> mount) and shutdown (unmount -> iSCSI down -> network down).
Problem #4 is a bit more delicate, and I’ll illustrate it with an example (the one that had me pulling out what is left of my hair): I wanted to use an iSCSI volume as the docker workspace mounted under /var/lib/docker. I set up the iSCSI volume as described above and put the following into /etc/fstab:
UUID= <volume UUID> /var/lib/docker xfs _netdev,x-systemd.requires=iscsi.service 0 0
The basic problem is that docker is starting up before the iSCSI mount is complete. Normal systemd dependencies for docker only require that the local filesystems be mounted and the network active before it will start docker. Given that it can take a non-trivial amount of time to get iSCSI volumes mounted, it is very likely that docker will start accessing its work directory (/var/lib/docker) before the iSCSI volume is mounted. Docker is very friendly in that if it finds its work directory empty, it will happily initialize it and start using it — normally something that is very useful; but in this case, it simply initializes its workspace in the base filesystem just in time for the mount to succeed and overlay what docker just did with the iSCSI volume. The result is that docker has no clue what happened and cannot function — even if the mounted volume contains a valid work directory structure that it created earlier. This is bad news, and docker just sits there babbling about missing layers or some other odd error.
My first solution was to modify the unit file for docker in the systemd repository (/usr/lib/systemd/system/docker.service). In the top of the file is the ‘[Unit]’ section; by default in the CentOS 7 distribution there are several lines there, including ‘After=’ and ‘Requires=’. Add to each of these lines ‘remote-fs.target’, separating it from any existing entries by a space. What this does is make the startup of docker occur after any/all remote filesystems are active (including our iSCSI volume because of the ‘_netdev’ option). This will force docker to wait for the workspace to be mounted, solving our problem. This works, but isn’t the best solution because the ‘docker.service’ file could be (and likely will be) replaced during the next upgrade of the docker package.
The final solution came to me when I was re-reading the ‘systemd.mount’ manpage again to try to figure out how to get the order right for both boot and shutdown. The manpage referred to mounting filesystems using /etc/fstab or unit files — that was the key. When systemd parses /etc/fstab, it creates a unit file for every mount point in /run/systemd/generator. The one I was looking for was called var-lib-docker.mount; I added ‘docker.service’ to the ‘Before’ line and saved it to /etc/systemd/system. The resulting file looks like this:
The other change I made to the file is to add the ‘[Install]’ section at the end … this tells systemd where to link this item into the dependency chain.
At this point, you need to do two things to enable this change:
- Remove the line that we’ve been building from the /etc/fstab file — it is no longer needed, and will cause problems if it remains
- Install the unit file: systemctl enable var-lib-docker.mount
Until I discovered the ability to mount filesystems from unit files (where the range of parameters is much wider), the solution was not optimal — because you had to modify a distribution file that would be overwritten when that package was updated. By combining all the dependency and ordering information into one file that will survive updates provides a robust solution.