Diskless / remote boot with Open-iSCSI:Troubleshooting

From lxadm | Linux administration tips, tutorials, HOWTOs and articles
Jump to: navigation, search

Go back to the main "Diskless / remote boot with Open-iSCSI" page.

Additional info / troubleshooting[edit]

never connect more than one PC to the same iSCSI target name[edit]

Traditional filesystems, like ext3, are designed to be run on one host at a time only. If you mount such a filesystem on more than one host at a time, disastrous things will happen almost certainly.

Note that even when you mount ext3 read-only, it doesn't mean that it won't write anything! It won't write data, but will replay journal. If another host has already mounted that filesystem, data corruption can happen.

If you really has to mount a filesystem more than once:

  • export it on the iSCSI target as read-only; then, mount it as ext2 on all initiators
  • use a properly configured distributed filesystem, like GFS or OCFS2.


surviving longer network disconnections[edit]

Occasional network disconnections (replacing cabling, switches, restarting/reconfiguring iSCSI target etc.) can have disastrous effects - the kernel will detect "aborted journal", and remount the device read-only. You may loose your work, system will not work properly anymore etc.

By default, Open-iSCSI waits 120 seconds for session re-establishment before failing SCSI commands. You can safely increase this time. To do that, iscsid has to be started on your machine. Change this line in iscsid.conf to increase the timeout (here, 2400 seconds, or 40 minutes):

node.session.timeo.replacement_timeout = 2400

If there will be a disconnection, applications will "freeze" once they will need to read or write from iSCSI. After the connection is re-established, they will recover properly, you will not have "aborted journal" and system remounted read-only.

Note that when you run iscsid, don't kill it on system shutdown - otherwise, your connection to the target will break.

If you have the system running, and don't want to restart it to increase the timeout, you may use this command:

iscsiadm -m node -T iqn.2007-05.net.my:store.backup -o update -n node.session.timeo.replacement_timeout -v 2400

do not use DHCP[edit]

If your client changes the IP address, your iSCSI session is likely to be lost. As the root filesystem is on a virtual SCSI disk (accessible via network), your connection should never be interrupted. A broken root filesystem iSCSI session can be compared to a regular PC with harddisk removed when it works...

However, with the change mentioned above ("surviving longer network disconnections"), you may have luck and your system will live.


cache files when needed[edit]

If you essentially need to disconnect the network very shortly (i.e., by the script setting up bridging, like the one used by Xen), try to read all files you will access *before* disconnecting the network; below, an example command that reads the files used by Xen's network-bridge script:

# Linux will cache the files it reads
COMMANDS="vconfig awk brctl echo egrep fgrep grep ifconfig ifup ip sed seq sh sleep"

for COMMAND in $COMMANDS ; do
command $COMMAND --help 
done &>/dev/null

/dev problem[edit]

If your server restarts automatically during startup, most likely the reason is it is not able to find devices in /dev directory. Make sure your iscsi target has required device nodes in /dev.

other topics[edit]

  • modify your scripts (startup, shutdown, and other) so that they never disconnect the network - what comes in mind are:
    • /etc/init.d/network - it shouldn't disconnect the device (i.e., eth0) you connect to iSCSI target,
    • /etc/init.d/open-iscsi - it shouldn't disconnect the node/target you are connected to
    • /etc/init.d/halt - normally, it will try to kill all processes - it shouldn't do it (or at least, it shouldn't kill iscsid, if you're running it)
  • if you want to boot Xen, visit Running_Xen_with_LILO to see how to create an integrated kernel + initrd image (it is needed by Xen if you want to boot it with LILO or PXE/tftp).
  • if you are using connection tracking in your firewall, take into account that iSCSI connection is started before iptables connection tracking and therefore all iSCSI packets will have the state INVALID as connection tracking mechanism has not been active when the connection had been established. Either use raw table and iptables target NOTRACK (and appropriate state UNTRACKED) or make exception for iSCSI packets. It should also work to include netfilter and connection tracking modules in initrd image and load them before iscsistart but it is against the idea of keeping initrd image as simple as possible.
  • if your iSCSI initiator is on a different network, you have to modify the script a bit - add routes etc.
  • note that init process need to have PID 1. This means, that you won't start this setup manually (i.e., by starting bash first, then configuring network manually, manually chrooting / pivoting etc.). Don't forget this when troubleshooting (put your all commands in the init script).
  • iSCSI initiators (open-iscsi) will not work with 64 bit kernels and 32 bit userspace - use either 64 bit kernels and 64 bit userspace, or 32 bit kernels and 32 bit userspace