Linux is unusual in OS kernels in that direct system calls from arbitrary userspace code are supported and ABI-stable. This model has always been a terrible idea. It robs the system of an ability to intercept system calls in userspace before doing an expensive privilege-mode transition.
If, instead, as on OpenBSD, the kernel enforced the rule that all system calls had to go through libc (or perhaps a big ntdll.dll-like VDSO), then the whole problem the linked article tries in vain to solve would disappear. If you wanted to hook a system call, you'd just change the libc/VDSO dispatch. No need to rewrite any instructions.
If I were Linus, I'd make a new rule: starting today, all new system calls must go through VDSO. No exceptions. SYSCALL from anywhere else? SIGKILL.
This way, you can just LD_PRELOAD in front of the VDSO and system call interception in userspace Just Works.
> This model has always been a terrible idea. It robs the system of an ability to intercept system calls in userspace before doing an expensive privilege-mode transition.
This model has always been a trade-off. It has downsides, but it also has upsides, including an immense boost in flexibility; decoupling from any particular userspace is useful.
> This way, you can just LD_PRELOAD in front of the VDSO and system call interception in userspace Just Works.
Can you LD_PRELOAD in front of the vDSO? I was under the (possibly mistaken) impression that the kernel injects it directly.
> all system calls had to go through libc (or perhaps a big ntdll.dll-like
Which makes containers crap on Windows and *BSD as they have to run the currect libc or equivalent. Thus you need to build a different container per OS version which sucks compared to Linux.
Based on https://www.phoronix.com/news/Linus-Torvalds-No-Random-vDSO , I had been under the impression that he wasn't fond of adding more use of vDSO. On rereading, I can't tell if that's a vDSO thing or a preference against fast randomness being provided by the kernel.
that is graphomania. syscalls are easy and dont require so much bloat. beside its lefty GNUnix license
Linux is unusual in OS kernels in that direct system calls from arbitrary userspace code are supported and ABI-stable. This model has always been a terrible idea. It robs the system of an ability to intercept system calls in userspace before doing an expensive privilege-mode transition.
If, instead, as on OpenBSD, the kernel enforced the rule that all system calls had to go through libc (or perhaps a big ntdll.dll-like VDSO), then the whole problem the linked article tries in vain to solve would disappear. If you wanted to hook a system call, you'd just change the libc/VDSO dispatch. No need to rewrite any instructions.
If I were Linus, I'd make a new rule: starting today, all new system calls must go through VDSO. No exceptions. SYSCALL from anywhere else? SIGKILL.
This way, you can just LD_PRELOAD in front of the VDSO and system call interception in userspace Just Works.
> This model has always been a terrible idea. It robs the system of an ability to intercept system calls in userspace before doing an expensive privilege-mode transition.
This model has always been a trade-off. It has downsides, but it also has upsides, including an immense boost in flexibility; decoupling from any particular userspace is useful.
> This way, you can just LD_PRELOAD in front of the VDSO and system call interception in userspace Just Works.
Can you LD_PRELOAD in front of the vDSO? I was under the (possibly mistaken) impression that the kernel injects it directly.
> all system calls had to go through libc (or perhaps a big ntdll.dll-like
Which makes containers crap on Windows and *BSD as they have to run the currect libc or equivalent. Thus you need to build a different container per OS version which sucks compared to Linux.
Windows doesn't even have its own libc.
They said "or equivalent", so ntdll
thats why OpenBSD is unconvinient for development - because it binds to libc bloatware
> If I were Linus, I'd make a new rule
Or, you know, just propose your idea to him
Based on https://www.phoronix.com/news/Linus-Torvalds-No-Random-vDSO , I had been under the impression that he wasn't fond of adding more use of vDSO. On rereading, I can't tell if that's a vDSO thing or a preference against fast randomness being provided by the kernel.