Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

I think it's more to do with avoiding overheads typically associated with system calls (presumably involving some interrupt and disabling/enabling/changing paging behaviour).

Here's an example of a syscall-heavy command on my system:

  $ time dd if=/dev/zero bs=1 count=10M of=/dev/null
  10485760+0 records in
  10485760+0 records out
  10485760 bytes (10 MB, 10 MiB) copied, 7.09089 s, 1.5 MB/s
 
  real    0m7.092s
  user    0m2.123s
  sys     0m4.968s
3 million system calls per second seems quite slow on a 3.2 GHz CPU when all it should really be doing is dereferencing a couple of pointers until it finds some functions that simply write a zero byte to a buffer (the "/dev/zero" descriptor handler) and ignore bytes from a buffer (the "/dev/null" descriptor handler).

If you have a safe bytecode format for representing operations that are performed in a loop, the kernel can just perform those operations without having to switch back and forth to userspace.



How much of that time is really spent in the system call interface?

You've got 4.968s of system time there (i.e. broadly the time spent in kernel code) and 2.123s of user time. Given that the user-space program is effectively a tight loop around read() and write() calls, we can assume that almost all of those 2 seconds are spent going through the syscall plumbing.

Now, there's going to also be some of the kernel-side time spent in the syscall plumbing too, but there's also a lot of I/O, buffer and filesystem layer code executing there. All of which will be in use with a BPF program too. So it's unclear how much of the effective time can be shaved off.


There shouldn't really be any significant filesystem code involved, since once `dd` has opened the files, it should have handlers for those devices more-or-less directly in its descriptor table. Once you have a descriptor to a pipe or device, there shouldn't be any filesystem-level checking in the middle of your reads/writes; all you're doing is filling/emptying buffers.

And given that I can write a program that makes 132 million calls per second to the glibc `putchar` function (which also buffers), I'm pretty sure there's a lot of time that can be shaved off as we start to replace the system call mechanism with plain function calls.


Have you forgotten about Meltdown, Spectre, and all the other cache attacks?


These are things that kernel developers are surely mindful of when coming up with and implementing eBPF functionality.

Regardless, I'm sure I've run this same test years ago and seen the system call count still in the same order (that is, a couple of million per second). I really doubt Spectre mitigations are what are causing what should be a few dereferences and function calls to take around a thousand clock cycles.


It's one of the two phases. We're back to the other one, wait for a couple of months.


Re-try with mitigations=off




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: