Also worth noting that Linux has changed the way it uses ERMS and FSRM in x86 copy multiple times since kernel 6.1 used in the article. As a data-dote, my machine that has FSRM and ERMS — surprisingly, the latter is not implied by the former — hits 17GB/s using plain old pipes and a 32KiB buffer on Linux 6.8