Setting IO scheduler for use with ZFS

If you’re using rotational hard drives, Linux’s default IO scheduler can interact very badly with ZFS’s IO scheduler, greatly reducing performance. This is further exaggerated if you have any SMR devices due to their pathological worst-case performance characteristics.

I’ve found that switching to “none” (this was called “noop” historically) can improve performance by a full order of magnitude or more, which can take SMR resilvers from “this will take over a month” to “this will take three days”. With purely CMR drives, it’s not such an impressive improvement: I didn’t test very extensively so the error bars could be large, but I found around a factor of two.

This scheduling problem is well-known and ZFS used to have an option to set “noop” automatically, but this was removed in 0.8.3, presumably because the developers felt it wasn’t ZFS’s place to change system settings like this.

The current recommendation is to use a udev rule if you need this. In my systems, all my rotational drives are used for bulk storage with ZFS, so this can be achieved very easily by setting a udev rule that applies to all rotational drives. Create a file with the following content, call it something like 66-io-scheduler.rules, and drop it in /etc/udev/rules.d/:

ACTION=="add|change", KERNEL=="sd[a-z]", ATTRS{queue/rotational}=="1", RUN+="/bin/echo none > /sys/block/%k/queue/scheduler"

Update: A previous version of this post used simply echo as the command. This doesn’t work because udev doesn’t use PATH so needs a fully qualified path to the executable.

Leave a Response