But why couldn't ZFS also take advantage of those opcodes?

ryao · on June 15, 2016

It is platform dependent. ZFS does not do that on Linux yet in part because of GPL symbol restrictions and the fact that there are other things to develop right now, although there has been some work done in this area to use the instructions directly. It definitely takes advantage of them on Illumos. I am not sure about the other platforms.

raattgift · on June 16, 2016

xnu's loadable kext isolation means that you have a one-time hit on using anything beyond x86-64+sse2, which can be paid at kernel prelink time, kext load time, or while a kext is running, via a trap that switches call preamble/postamble to handle the extra state (and which facilitates selecting fast paths on a cpu-by-cpu basis, for example). Only the presence of x87 insns impose noticeable cost.

o3x builds and runs just fine with -O2 -march=native and the latest clang just by changing CC and CFLAGS; the kexts that get built aren't backwards compatible though (you'll get a panic if you build with -march=native on a machine that does AVX and run on a machine that doesn't).

The code that recent clang+llvm generates makes heavy use of the XMM and YMM registers, and does some substantial vectorization. The compression and checksumming and galois field code that's generated is strikingly better, although not quite as good as the hand tuned code in e.g. (https://github.com/zfsonlinux/zfs/pull/4439). It may be interesting to compare performance, but given that compression=lz4 and checksum=edonr has negligible CPU impact on a late 2012 4-core mac mini (core i7) even when doing enormous I/O (> 200k IOPS to a pair of Samsung 850 PROs), hand tuning likely won't make as much of a difference as moving up from compression=on, checksum=[sha256|fletcher4].

I'm pretty sure that once the hand tuned stuff is in ZOL it'll get looked at by lundman for possible integration.

creshal · on June 14, 2016

I'd be surprised if it doesn't. AVX is really good at speeding up compression/checksumming algorithms, and AESNI is standard in most AES implementations nowadays.

mschuster91 · on June 14, 2016

Because not every opcode is made public. The "usual suspects" for SIMD and encryption are public, yes, but nothing stops Intel from adding opcodes so highly specialized that they essentially represent the exact program code of the filesystem.