It's not an event system in the sense that it's not just a way to get notified when a file descriptor is ready for reading or writing. Like the OP says, it is a way to run syscalls asynchronously.
> It's not an event system in the sense that it's not just a way to get notified when a file descriptor is ready for reading or writing.
first time in my life I hear people calling this an event system. For me it's always been any architecture centered around asynchronous message queues.
Event-Driven Architecture refers to a different thing, but people used to refer to async code as event-based back before async/await existed, when doing async meant writing the Event Loop yourself.
"event system" is any mechanism where events (readiness, completion, signals, GUI messages) drive control flow, restricting "event system" to descriptor readiness exclusively is the author's personal framing, not exactly common parlance
There are no callbacks in io_uring. You submit an operation (SQE), the kernel runs your request while you're doing other stuff, eventually you get back a completion entry (CQE) with the results of the operation.
Saying it’s a callback is equivalent to claiming select is a callback. Receiving an event does not a callback make - providing the function address to invoke would make it a callback.
It's not. The NT kernel and some others have genuine callbacks in some of their syscalls, where you pass a userspace function pointer which the kernel invokes on completion; io_uring isn't that and Linux doesn't have anything like that.
The kernel can't "run a function". It can only wake a thread sleeping on a syscall. This is called blocking IO.
The whole point of async is to find a compromise between context switch cost and event buffering and the latency resulting from the latter. It is not about "running a function".
There is limited chaining capability in io_uring that can be an actual gamechanger if your chain of ops can be fully in-kernel.
An intern of mine wrote a whole tcp-checkpoint-(pause-ingress-and-egress-and-the-socket-save-all-state-unpause-everything)-and-send-full-state-on-other-socket (which was a dozen or so ops - netlink writes, ioctls, set/getsockopt, read/write calls...) in a chain - all in one command-queue-write IIRC.
Performance was as good as an ad-hoc kernel module, without any ebpf. We just had one kernel patch to handle some unhandled syscall (getsockopt ? Setsockopt ? Ioctl?) (that we sadly didn't upstream... 2 years ago) and we were done. Really a great system for batching syscalls.
It made me wish for a form of DAG for error-handling or for parallel operations in chains...
io_uring can also use an eventfd to signal you need to check the completion queue. We use this with libdispatch to run a clang block on completion (the block is stored in the user_data). Admittedly this is a very specific use case.
Yep. Signals were literally the original async model on Unix. They were a userspace abstraction over hardware interrupts, much as they are today, but the abstraction didn't turn out to be as fruitful as it might have been, perhaps because it was too thin. (Signaling queueing, i.e. real-time signals, meant to make signals more useful for application events, never went mainstream.) Back in the 1970s and 1980s the big arguments regarding async were about interrupt-driven (aka signals) and readiness-driven (aka polling), and relatedly edge-triggered vs level-triggered events. BSD added the select syscall along with the sockets API and that's when the readiness-driven, level-triggered model began to dominate. Though, before kqueue and then epoll came along there were some attempts at scaling async I/O using the interrupt-driven model--a signal delivered along with the associated descriptor. I think there's a vestige of this still in Linux, SIGIO.
It's not always either/or, though. From the perspective of userspace APIs it's usually one or the other, but further down the stack one model might be implemented in terms of the other, sometimes with multiple transitions between the two, especially around software/hardware boundaries. Basically, it's turtles all the way down.
Similarly, the debates regarding cancellation, stack management, etc, still persist; the fundamental dilemmas haven't changed.
Like other people here wrote nowadays one can push some processing into kernel context, but that sort of defeats the purpose of kernel/userland border. One can just write a kmod and be done with it then (and lose isolation).
After decades of gleefully using signal handlers to handle all sorts of contingencies, systems programmers were solemnly informed that signal handler functions were very dangerous indeed, because a bunch of other stuff was on the stack and undefined while they were being run, and therefore, handler functions couldn't call anything that was unsafe or non-reentrant.
Systems programmers were told that the best signal handler function was one that set a flag, a volatile int, and then exited immediately without doing or touching anything else.
Sort of defeats the purpose of the elaborate signal-handler-callback-pointer-to-function system we had in place, but them's the breaks.
>io_uring is not an event system at all. io_uring is actually a generic asynchronous syscall facility.
It's not an event system at all! It's an event system!
The type of events being subset of all possible events in system does not make it not events system. Nor it being essentially a queue
It's not an event system in the sense that it's not just a way to get notified when a file descriptor is ready for reading or writing. Like the OP says, it is a way to run syscalls asynchronously.
> It's not an event system in the sense that it's not just a way to get notified when a file descriptor is ready for reading or writing.
first time in my life I hear people calling this an event system. For me it's always been any architecture centered around asynchronous message queues.
Event-Driven Architecture refers to a different thing, but people used to refer to async code as event-based back before async/await existed, when doing async meant writing the Event Loop yourself.
Yes, but "fd readiness checker" is a super narrow nonstandard definition of "event system". Though I get what the author tries to say.
"event system" is any mechanism where events (readiness, completion, signals, GUI messages) drive control flow, restricting "event system" to descriptor readiness exclusively is the author's personal framing, not exactly common parlance
I thought the main selling point was that you could tell the system "when you finally get this response, run this function on it"
There are no callbacks in io_uring. You submit an operation (SQE), the kernel runs your request while you're doing other stuff, eventually you get back a completion entry (CQE) with the results of the operation.
Sounds like a callback to me
Saying it’s a callback is equivalent to claiming select is a callback. Receiving an event does not a callback make - providing the function address to invoke would make it a callback.
There is no callback. The response just shows up on the other ring buffer.
The client decides when to look at the ring buffer
Callback execution is: wait until begin event occurs, then do this operation.
Asynchronous execution is: do this operation, then wait until finish event occurs.
They are opposites.
It's not. The NT kernel and some others have genuine callbacks in some of their syscalls, where you pass a userspace function pointer which the kernel invokes on completion; io_uring isn't that and Linux doesn't have anything like that.
no it is not.
The kernel can't "run a function". It can only wake a thread sleeping on a syscall. This is called blocking IO.
The whole point of async is to find a compromise between context switch cost and event buffering and the latency resulting from the latter. It is not about "running a function".
There is limited chaining capability in io_uring that can be an actual gamechanger if your chain of ops can be fully in-kernel.
An intern of mine wrote a whole tcp-checkpoint-(pause-ingress-and-egress-and-the-socket-save-all-state-unpause-everything)-and-send-full-state-on-other-socket (which was a dozen or so ops - netlink writes, ioctls, set/getsockopt, read/write calls...) in a chain - all in one command-queue-write IIRC.
Performance was as good as an ad-hoc kernel module, without any ebpf. We just had one kernel patch to handle some unhandled syscall (getsockopt ? Setsockopt ? Ioctl?) (that we sadly didn't upstream... 2 years ago) and we were done. Really a great system for batching syscalls.
It made me wish for a form of DAG for error-handling or for parallel operations in chains...
io_uring can also use an eventfd to signal you need to check the completion queue. We use this with libdispatch to run a clang block on completion (the block is stored in the user_data). Admittedly this is a very specific use case.
> The kernel can't "run a function".
What is a signal handler?
Yep. Signals were literally the original async model on Unix. They were a userspace abstraction over hardware interrupts, much as they are today, but the abstraction didn't turn out to be as fruitful as it might have been, perhaps because it was too thin. (Signaling queueing, i.e. real-time signals, meant to make signals more useful for application events, never went mainstream.) Back in the 1970s and 1980s the big arguments regarding async were about interrupt-driven (aka signals) and readiness-driven (aka polling), and relatedly edge-triggered vs level-triggered events. BSD added the select syscall along with the sockets API and that's when the readiness-driven, level-triggered model began to dominate. Though, before kqueue and then epoll came along there were some attempts at scaling async I/O using the interrupt-driven model--a signal delivered along with the associated descriptor. I think there's a vestige of this still in Linux, SIGIO.
It's not always either/or, though. From the perspective of userspace APIs it's usually one or the other, but further down the stack one model might be implemented in terms of the other, sometimes with multiple transitions between the two, especially around software/hardware boundaries. Basically, it's turtles all the way down.
Similarly, the debates regarding cancellation, stack management, etc, still persist; the fundamental dilemmas haven't changed.
It's still a context switch per event.
Like other people here wrote nowadays one can push some processing into kernel context, but that sort of defeats the purpose of kernel/userland border. One can just write a kmod and be done with it then (and lose isolation).
After decades of gleefully using signal handlers to handle all sorts of contingencies, systems programmers were solemnly informed that signal handler functions were very dangerous indeed, because a bunch of other stuff was on the stack and undefined while they were being run, and therefore, handler functions couldn't call anything that was unsafe or non-reentrant.
Systems programmers were told that the best signal handler function was one that set a flag, a volatile int, and then exited immediately without doing or touching anything else.
Sort of defeats the purpose of the elaborate signal-handler-callback-pointer-to-function system we had in place, but them's the breaks.
[dead]