In certain machines with very weak CPUs and/or many very powerful connections. F...

monocasa · on March 25, 2020

No, on workstations and servers, particularly in a post spectre world, putting your network drivers into user space will absolutely destroy your perf because of the added context switches.

You'd maybe have a point if it were an L4, but mach ports are used as an example now of how not to do microkernel IPC because of how much overhead they use.

Dylan16807 · on March 25, 2020

A few thousand context switches per second is minor enough even with spectre mitigations, and if you need more than that you failed the "mild levels of competence" test.

monocasa · on March 25, 2020

Because those DPDK guys are just a bunch of clowns I guess, trying to avoid even the normal one user/kernel transition.

Dylan16807 · on March 25, 2020

They have a completely different goal, much harder than merely saturating a single network port.

monocasa · on March 25, 2020

..no, you fundamentally have a 1 to 1 relationship with a core/port with DPDK. And a lot of the use case is very much normal server style work loads, it's not just people running network switches with it.

Dylan16807 · on March 25, 2020

According to https://blog.selectel.com/introduction-dpdk-architecture-pri... they are largely trying to avoid bottlenecks that exist inside the Linux kernel itself, bottlenecks that happen even with zero context switches. That's a totally different problem. Also to avoid having a system call per packet, which falls under "mild levels of competence" for an API designed this decade. Userspace networking also exists to eke out absolute minimum latency, which you don't need just to saturate a port.

When your only goal is to avoid throughput bottlenecks, you don't need anything fancy. Avoid having a context switch per packet and you're most of the way there. A context switch every millisecond, or something in that order of magnitude, is completely harmless to throughput. If it causes your core to process 10% fewer packets than if it had zero context switches, then use 1.5 cores. Context switches take nothing anywhere near a millisecond each.

monocasa · on March 25, 2020

Your citation literally says

> Another factor that negatively affects performance is context switching. When an application in the user space needs to send or receive a packet, it executes a system call. The context is switched to kernel mode and then back to user mode. This consumes a significant amount of system resources.

And they're talking about the socket API, so when they say "a packet" they really mean "any number of packets".

The rest is mainly about metadata that needs to be maintained specifically because kernel and user are in different address spaces and can't directly share in memory data structures, and is additionally exasperated by splitting the device driver away from the network stack like macos is doing.

The only part that isn't ultimately about the user/kernel split and it's costs is the general protocol stuff in the network stack, and that was always the most specious of the claims of DPDK anyway.

Just so you know, you're talking to someone who used to write NAS drivers.

Dylan16807 · on March 25, 2020

> And they're talking about the socket API, so when they say "a packet" they really mean "any number of packets".

It's completely different if you have one switch per packet vs. one switch per thousand packets.

You're taking things to a ridiculous extreme to imply that any amount of context switching is a deal-breaker. There is a specific number of context switches before you reach 1%, 10%, 50% overhead. There are many reasons to avoid context switches besides overhead, but they are all either based on the underlying implementation or simply not critical to throughput. You're oversimplifying, despite your credentials. The implementation can be changed/fixed without completely purging context switches. There are many tradeoffs, and doing pure user-space is a viable way to approach things, but it's not the only approach.

Memory sharing and metadata slowness is an easy bottleneck to have, but the way you avoid it, by changing data structures and how you talk to different layers of code and the device, can be done whether you put it in the kernel, in pure user space, or split it between the two.

saagarjha · on March 25, 2020

> A few thousand context switches per second is minor enough even with spectre mitigations

Wouldn’t these be Meltdown mitigations?