This is the back-link problem in Rust. You can do back links with weak pointers, but it's rather verbose and involves extra run-time checks.
I've been trying to figure out a compile-time approach to that problem.
Here's an early version of a tech note on that.[1]
It looks possible, but there are many issues. But it's important to look at now. The C++ people are trying to figure out safe backlinks.
My gut says the following, so take with a grain of salt:
If your goal is to not do refcounting,
You can tie compile time guaranteed access on a 'parent getter' (but only ever as a non-mut reference) by saying:
- parent element MUST be present at time of construction
- When doing a 'parent get' call, you MUST still prove you're in the parent's lifetime.
This can be done at compile time at 0 runtime cost.
The problem is, that these constraints drastically reduce the usability of those back-links.
There is no compiler solution in Rust. Rust lifetimes form a DAG of (overlapping) ranges. (in terms that some leaf nodes are inductive proofs)
To relax those 2 constraints require a bi-directional graph, and I'm not sure if that can ever be computed in a reasonable time.
- Single ownership with back references that never outlive the owning reference
This requires a construct that manages the forward and back references together, preserving the invariant that A->B->A. It's also necessary to insure that no non-weak back reference from B to A exists when A is dropped.
- True multiple ownership with weak back references
This is the hard case. It might be solveable for complicated situations. See the example in my tech note that links to a Rust Playground file.[1] That's not a contrived example; it's code in actual use. It builds a set where all elements of the set can find the set itself. If two sets are merged (union), all elements move to one set struct, and the other now-empty set struct is discarded. All the back references are maintained during this operation.
Most back references seem to fall into one of those two cases. In fact, the second includes the first. The first is so common that it's worth handling separately.
The core problem is finding an approach which is 1) powerful enough to be useful on real code, and 2) not too hard to check at compile time. The second case requires some restrictions on coding style.
The preferred coding style is short borrows with very narrow scope, typically within one statement. That's normally considered bad, because it means extra time checking for double borrows. But if this kind of borrow can be made compile-time, it's free. The compiler is going to have to check for overlapping borrow scopes, and the more local the borrow scope, the less compile time work is involved. Storing a borrowed link in a structure or a static requires too much compile-time analysis, but if the lifetime of the borrow is very local, the analysis isn't hard. Borrow scopes which include a function call imply the need for cross-function call tree analysis.
It's been pointed out to me that traits and generics make this especially hard, because you don't know what the instantiation can do before generics are instantiated. That might require some kind of annotation that says to the compiler "Implementations of function foo for trait Foobar are not allowed to do an upgrade of an Rc<Bar>". Then, at the call, the compiler can tell it's OK for a borrow scope to include that call, and in an implementation of the called function, upgrade of Rc<Bar> is prohibited. In fact, if this is done for all functions that receive an Rc<Bar> parameter, it might be possible to do all this without call tree analysis. It's becomes a type constraint. Need to talk to the type theorists, of which I am not one.
All those .borrow() calls are annoying. It might be possible to elide them most of the time, in the same way that you can refer to the item within an Rc without explicitly de-referencing the Rc. That would be nice from an ergonomic perspective. But it runs the risk of generating confusing compiler messages when a problem is detected. It would be like those confusing error messages that arise when type inference can't figure something out or the borrow checker needs explicit lifetime info to resolve an ambiguity.
It's pretty easy to do at runtime without weak pointers, people who write rust are just allergic to reference counting. Which is absurd, because most programmer use RC values thousands of times every day without even thinking about it. It's really this easy: https://pastebin.com/SAeKG7Sw
works, but you have to manage your own teardown of the list. If you use Weak, releasing the strong reference to the head of the list tears down the whole list.
> The last time I used [a doubly linked list] was in the nineties. I know the Linux kernel uses them, but that design was also laid down in the nineties; if you were designing a kernel from scratch today, you would probably not do it that way.
This and sibling comments about the supposed uselessness or outdatedness of linked lists taught me something about the disconnect between many systems engineers resistant to Rust and many members of the (evangelical) Rust community.
To address the technical topic first: a doubly linked list is what you reach for when you’re looking for an intrusive data structure with fast append and fast delete. You need to iterate over it, but don’t need random access. Queues where you may need to remove elements. Think a list of completions that can be cancelled.
Why an intrusive data structure? Because you don’t want to, or can’t allocate. Because you want only a non-owning reference to the object, and you don’t want to have to manage any other lifetimes or dynamic (potentially unbounded, or arbitrarily bounded) allocations. Intrusive data structures are a great tool to minimize allocations, and to keep allocations (and ownership) logically separated across modules.
And now onto the cultural topic. That a member of the Rust community might not know this (which is of course totally fine) is what taught me something, which was maybe obvious to many: Rust brings safety to a C++ style language and audience. For the same reasons many dislike C++, many dislike Rust. For the same reasons many would opt for C over C++ when they have the choice (either across all their projects or for certain projects), many continue to opt for C over Rust.
Rust will not be taking over systems programming for the same reasons C++ did not. Well, by the numbers, it probably did (and Rust probably will too), so maybe better to say that it will not stamp out C for systems programming for the same reasons C++ did not. And it’s not just because of the stubbornness of C programmers.
Systems programming has 2 cultures and practice groups: the C camp, and the C++ camp. The C++ camp is a big tent. I’d argue it includes Rust and Swift (and maybe D). Zig belongs to the C camp.
Safety is important, and maybe Rust has the right model for it, but it is not a solution for the C camp. It likely did not intend to be, and again, by the numbers, replacing C++ is probably the more important task.
I'm aware of the reasons Linux uses doubly linked lists. I'm still of the opinion this design decision made a lot of sense in the 1990s, and would make less sense in a new project today. You may disagree, and that's fine! Engineering is constrained by hard truths, but within those constraints, is full of judgment calls.
I'm not a member of the evangelical Rust community. I'm not proclaiming 'thou shalt rewrite it all in Rust'. Maybe you have good reasons to use something else. That's fine.
But if you are considering Rust, and are concerned about its ability to handle cyclic data structures, there are ways to do it, that don't involve throwing out all the benefits the language offers.
Linux's use of pointer-heavy datastructures is largely justified even today. The low-level kernel bookkeeping requirements rule out many typical suggestions. Modern hardware suggests certain patterns in broad strokes, which are sound for mostly any software. But the kernel has the unique role of multiplexing the hardware, instead of just running on the hardware. It is less concerned with, e.g., how to crunch numbers quickly than it is with how to facilitate other software crunching numbers quickly. As someone else noted elsewhere in the top-level thread, unsafe Rust is the appropriate compromise for things that must be done but aren't suitable for safe Rust. Unsafe Rust is one of the best realizations of engineering tradeoffs that I've seen, and neither C nor safe Rust justify its absence. Rust is only a "systems" language with unsafe, and that doesn't drag Rust down but rather strengthens it. "Here is the ugliness of the world, and I will not shy from it."
Beautifully put, poetic even. The only problem is that, in (*actual).reality, unsafe Rust is difficult to read, difficult to write, difficult to maintain, difficult to learn, has zero features for ergonomics, and has poorly defined rules.
C and Zig have a big advantage for writing maintainable, easy to read unsafe code over unsafe Rust.
I agree that unsafe Rust is not comfortable or simple, but I think it is usable when appropriately applied. It should very scarcely be used as a performance optimization. The Rust devs are quite invested in minimizing unsoundness in practice, particularly with Miri. In coming years, I expect unsafe to be further improved. Over the entire ecosystem, Rust with judicious use of unsafe is, in my opinion, vastly superior to C, unless the C is developed stringently, which is rare.
So, as an example, you'd be happily spending an extra allocation and extra pointers of space for each item in a list, even when that item type itself is only a couple of bytes, and you potentially need many millions of that type? Just so your design is not "from the nineties"?
An extrusive list needs at least 1 more pointer (to the item, separate from the link node), and possibly an additional backpointer to the link node when you want to unlink that node. It also adds allocation overhead and cache misses.
Intrusive lists are one of the few essential tools to achieve performance and low latency.
Or were you thinking of dynamically reallocating vectors? They are not an alternative, they are almost completely unusable in hardcore systems programming. Reallocating destroys pointer stability and adds latency, both very bad for concurrency.
I’m sorry, I did not intend to accuse you of being part of the evangelical community. Your article only prompted the thought I shared.
On the technical point, I think I do disagree, but open to changing my mind. What would be better? I’m working on an async runtime currently, written in C, and I’m using several intrusive doubly linked lists because of their properties I mentioned.
As to what would be better - this is also a reply to your sibling comments above - I don't have a single across-the-board solution; the equivalent of std::vector everywhere is fine for some kinds of application code, but not necessarily for system code. Instead, I would start by asking questions.
What kinds of entities are you dealing with, what kinds of collections, and, critically, how many entities along each dimension, to an order of magnitude, p50 and p99? What are your typical access patterns? What are your use cases, so that you can figure out what figures of merit to optimize for? How unpredictable will be the adding of more use cases in the future?
In most kinds of application code, it's okay to just go for big-O, but for performance critical system code, you also need to care about constant factors. As an intuition primer, how many bytes can you memcpy in the time it takes for one cache miss? If your intuition for performance was trained in the eighties and nineties, as mine initially was, the answer may be larger than you expect.
While Rust is certainly competing against C++ moreso than C, I think Rust has qualities that make it suitable to rewrite old C programs or write new programs that would be written in C. Drawbacks of Rust such as excessive dependency usage, simply feeling too complex to write and read, overuse of the type system, or the subpar low-level hardware correspondence, are not huge issues in the right subculture. I think Rust is not quite C but offers a distinct path to do what C does. In any case, legacy C programs do need a pick-me-up, probably many from different technologies, and I think Zig is at most a small factor.
Rust won’t even succeed at replacing C++. There are technical and cultural issues at play. C++ has evolved a lot and still does some things better than Rust (dynamic linking, for example). Rust will be another popular systems programming language. There will be better ones. People will take their picks.
I find C++ friendlier for small hobby projects because it lets you get your code compiled even if it's a little messy or slightly unsafe.
As for "serious code", I admit that Rust is maybe better for low-level security-critical stuff. But for native applications it's on par thanks to smart pointers and also just -fsanitize=address.
(also default copy semantics just seem more intuitive i dunno why)
> To address the technical topic first: a doubly linked list is what you reach for when you’re looking for an intrusive data structure with fast append and fast delete. You need to iterate over it, but don’t need random access. Queues where you may need to remove elements. Think a list of completions that can be cancelled.
>
> Why an intrusive data structure? Because you don’t want to, or can’t allocate. Because you want only a non-owning reference to the object, and you don’t want to have to manage any other lifetimes or dynamic (potentially unbounded, or arbitrarily bounded) allocations. Intrusive data structures are a great tool to minimize allocations, and to keep allocations (and ownership) logically separated across modules.
If you really wanted this in Rust, you could probably get away with just initialising a Vec::with_capacity(), then have each element’s next and prev be indexes into the vec.
That would be the “arbitrarily bounded” allocation. Arbitrary because now you have to make a decision about how many items you’re willing to maintain despite that number being logically determined by a sum over an unknown set of modules.
> Handles are deterministic. If a bug made your program crash on the last run, it will crash the same way on this run.
> Given that the arena will still be subject to array bounds checking, handle bugs won't allow an attacker to overwrite arbitrary memory the way pointer bugs do.
Just because you're in-bounds of the arena, doesn't mean you're not stomping on in-bounds neighboring data -- which can introduce "nondeterminism" as the author defines it. These are hand-waving arguments.
> doesn't mean you're not stomping on in-bounds neighboring data
I don't see how you would do that in safe rust though. The point of the arena would just be to solve the lifetime issue. It just moves the ownership problem one level higher. Instead of having circular ownership in a linked list, all linked list nodes are owned by the arena and hence have the same lifetime as the arena.
Anticipating pushback: yes, you can disallow "pointer arithmetic" on handles, and store fingerprints in the "slots" to ensure they still contain the handle's identity to detect user-after-free, but congrats, you've implemented sparse sets, which there are dozen's of C++ implementations of with the same safety guarantees, so it's unclear what rust is bringing in that case (e.g. https://github.com/skypjack/entt/blob/master/src/entt/entity...)
That C++ already has many implementations of sparse sets seems to be a point in favor of sparse sets rather than a point against Rust, especially given that C++ doesn't need them the same way Rust does.
Well, there's several implementations because it's a bit of a leaky abstraction, and implementation details matter/vary across use-cases, which is consistent w/ my experience of rust having heavy fragmentation in applying the array-index pattern for "weak pointers."
Indeed, the topic is "in Rust", so you'd do arenas in Rust due to those memory safety benefits outside of arenas. And since arenas aren't as bad due determinism etc, it's still a net benefit: safer outside, safer inside
Given Rust’s memory safety goal, IMO the proper way to solve the circular reference problem in the context of Rust would be in the language. It should have a way to express what cycles may exist, maybe also how the code will break them when needed.
The hardest problems I think is that this requires a fundamental change to the language that something might be already borrowed just by existing. It also likely needs to introduce a notion of "stable" places that will not move when you move a prefix of them, e.g. to allow moving a `Box` holding self-referential data. In other words, there's a lot of bikeshedding to do.
Here's an implementation of a doubly-linked list which is perfectly fine on modern hardware: an array of structs where each struct contains a pointer to its next and previous elements in addition to whatever other data is being stored.
Here's a situation where a traditional pointer-based double-linked list is fine: when the payload is very large (e.g., an entire document in some app).
> The last time I used [a doubly linked list] was in the nineties. I know the Linux kernel uses them, but that design was also laid down in the nineties; if you were designing a kernel from scratch today, you would probably not do it that way.
Curious to know in the Linux context whether moves away from doubly linked lists have ever been tested?
Linux uses them because it has a lot of lists of objects that very rarely or never need to be iterated over, but where it needs to be fast to
- Insert a new item
- Delete a specific item (that you have a pointer to
- Move a specific item from list A to list B
- Get the next item to work on
And where pointers to an item need to be stable. Doubly-linked lists are very fast for all of that; their main downside is they are slow to iterate over due to cache incoherency but that doesn't matter if you very rarely or never iterate through the entire list. And since you need to have stable pointers to items in the list that don't change when you insert/remove items, most other data structures would need an extra level of indirection so you lose the cache coherency benefit anyways.
> There are several ways to solve this problem. One way is to avoid using direct references to the particular class of objects at all. Instead, allocate a big array of objects, and refer to them with integer indexes into that array. There are several names that have been used for such arrays and indexes; let's call them arenas and handles.
I thought the term "Arena" referred to linear allocators, but maybe it's not so narrowly defined.
> At one level, this question is not very fair, because an answer to it as stated would be that one simply does not use doubly linked lists. They have been popular in introductory computer science lectures because they are a neat way to explain pointers in a data structure that's easy to draw on a whiteboard, but they are not a good match to modern hardware. The last time I used one was in the nineties. I know the Linux kernel uses them, but that design was also laid down in the nineties; if you were designing a kernel from scratch today, you would probably not do it that way.
The arena data structure you described is inefficient unless it uses a linked list to track empty slots. All general-purpose heap allocators use linked lists in their implementations. Linked lists show up wherever you want pointer stability, low fragmentation, or a way to decouple the storage of objects from their participation in data structures. I struggle to imagine how it would be possible to implement something like Linux without at least one linked list in it.
> Essentially you are bypassing the notion of pointers provided directly by the hardware
Pointers aren't a hardware concept, they're a language concept. Array indices and pointers both rely on indirect addressing, which is the underlying hardware concept. The "handles" strategy in Rust feels like a kludgy approach not because it bypasses the hardware but because Rust's borrow checker isn't actually involved in ensuring safety in that case, just its bounds checking and mandatory initialization (and if all you need is those two things then...).
I'm a bit agnostic about the specific solution these days. In general, early binding(so, static memory and types, formalized arenas and handles, in-line linear logic with few or no loops or branches) debugs more readily than late(dynamic memory allocs and types, raw pointers, virtual functions, runtime configuration). The appeal of late binding is in deferring the final computation to later so that your options stay open, while the converse is true with early binding - if you can write a program that always returns a precomputed answer, that's easier to grasp and verify.
When we compare one method of early binding with another, it's probably going to be a comparison of the granularity. Arenas are "stupid simple" - they partition out memory into chunks that limit the blast radius of error, but you still make the error. Ownership logic is a "picky generalization" - it lets you be more intricate and gain some assurances if you put up with the procedures necessary, but it starts to inhibit certain uses because its view into usage is too narrow with too many corner cases.
If we take Go's philosophy as an example of "what do you do if you want idioms for a really scalable codebase" - though you can certainly argue that it didn't work - it's that you usually want to lean on the stupid simple stuff and customize what you need for the rest. You don't opt into a picky Swiss Army Knife unless there's a specific problem to solve with it. Larger codebases have proportionately smaller sections that demand intricacy because more and more of the code is going to itself be a method of defining late binding, of configuring and deferring parts of processing.
That said, Rust lets you fall back to "stupid simple", it just doesn't pave the way to make that the default.
I wish Rust would finalize its custom memory allocators api (allocator_api) and put it in the stdlib as soon as possible. That feature has been unstable for a decade now.
I'd really like them not to hurry (or at least not "ASAP"). Among features that Rust should get right, this is definitely one of them.
And from its tracking issue [1], it seems there's alternatives with really important trade-offs (especially the very interesting storage API alternative), and it'd be really sad if we got stuck with something that ends up being unusable in many cases people actually want to use them on.
Fully agree with you. The feature needs to be done right, and an overhaul like the storage proposal is probably necessary. I think the allocator feature is one of those features that Rust historically had leeway on, because it's still relatively new, but now a bright path for the years (decades) ahead is crucial. Unstable-but-implemented-and-maybe-stabilizing is still better than stable-but-sucks-forever.
I kinda think rust is a dead-end. The learning curve is really challenging for most people, and you gain all these new constraints.
It competes against languages that give you memory safety by having garbage collection, or I guess even by using arenas in a few cases. GC comes with a performance hit, but a bit of a performance hit is usually much easier to deal with than having a smaller pool of effective developers
Nah. It's like learning vim, you get over the first shock and then you have a lifetime friend.
And another thing is with current coding agents, like the gpt-5-codex, Rust really shines here. The agent can easily guide you through the first borrow checker issues, and the compiler helps the agent to write good code.
Source: been writing Rust for a decade now. It is easily the best language I've ever used, still.
I'm experiencing that shock now. Can't create a global instance of my self-made heap datastructure. I'm longing for C memory management at the moment - it seems a tiny price to pay.
Now I learn that linked lists aren't possible and one has to use some, frankly, bullshit scheme to pretend to have them. IMO the documentation isn't really preparing people for the new reality of what can be done and how to do things.
Linked lists aren’t just possible in Rust, they are in the standard library.
They are just an example of the borrow checker’s limitations, which mean that you can only form a DAG of mutable references. But that’s why we have `unsafe`.
The way I think about rust is this: you can experience the pain up front while you’re getting your program to compile, or you can experience pain later during runtime in random unpredictable ways. Rust is the former. It takes time to learn how to reduce the up front pain, but I’m so much more confident in the end result.
I say this having years of coding background in languages like C, C#, Java, JavaScript, Python, etc.
It's fair enough but if one isn't writing multi-threaded code or code with some complicated ownership scheme then that upfront investment may never really deliver a return.
Not true. Whole classes of bugs are just eliminated - most importantly NPEs. I’ve seen tiny scripts in node that crash due to lack of an exception handler, and I’ve witnessed countless runtime errors in Java in the simplest of web services.
You can do the same in Rust with Box::leak, which takes an owned pointer and gives you back a 'static borrow. The only caveat is that unless you reconstitute the type its Drop impl won't be called, but that's likely exactly what you wanted.
That’s certainly possible but I think it’s increasingly untrue for all but the smallest programs. The problem is that it’s hard to appreciate how many times your program didn’t crash but would have if you’d used a different language. You don’t need to be very complex before you have the risk of something crashing due to an edge case around values which are almost never null, even in languages like Python.
My productivity level in C is medium. In rust I'm competely blocked because I want to create a global variable and absolutely none of the described ways in the docs or stackoverflow work for my example. Should I give up on this "bad idea".....well I cannot because I can't accept not understanding why this is not working when people say it can.
Compared to C the situation is outlandishly, even hellishly impossible to understand. If I can't understand this one thing then I feel there's no point in continuing so I must stay and battle it till I get it or give up completely. I don't think I've ever hit anything like this in any other language I've ever learned.
The counterpoint I'd have for this argument is that code that isn't multithreaded or with complicated ownership scheme ends up being very simple looking Rust, so the difficulty of the language doesn't come into play.
I get the sense that Rust is probably a much better replacement for C++ than C, and that Rust discussions frequently devolve because C programmers and C++ programmers are talking past each other, without understanding the divide.
* The performance difference can be pretty massive depending on scenarios. Sure, most programs don't need it, but still
* Rust has not only memory safety but full safety around it. For instance, in Java if you iterate over a list while mutating it you'll get a runtime error. In Rust, it won't compile, preventing you from a possible bug later on
* The time you spend learning Rust is so much time saved afterwards because the language is specifically designed to reduce the risk of programming errors. Many things like the absence of a `null` value like we have in Java is an immense time saver when your program gets bigger
> The time you spend learning Rust is so much time saved afterwards because the language is specifically designed to reduce the risk of programming errors
I guess I just very rarely find problems where GC isn't an option. Even the ones where people say "this has to be fast so we can't use a GC" usually work well with a GC.
> Doesn't that mean you are throwing out all the memory safety properties you were hoping to achieve by using Rust in the first place? ...That argument sounds logical, but it's not actually correct.
Actually, it is correct. The more generally you use "arenas" the more they are subject to the same kinds of issues as other manual memory management systems. "arenas" can only avoid/reduce the "nondeterminism" and "security" issues of general manual memory management on top of a buffer of bytes by not becoming general manual memory management on top of a buffer of bytes.
It all comes down to whether pointers into the arena do something different than normal pointers when they are dangling.
Normal pointers cause UB. That’s astronomically bad. If your arena pointers crash your program in a well-defined way, that’s already a much better situation.
The reason pointers are UB is that they could be anything including another object, code or memory mapped hardware interfaces. The analog here would be if you would just use the same index into a different arena, that trouble also wouldn't be bounded to just one arena.
You just need to put something executable, in whatever sense, in the arena... values that represent functions to call or dynamic libraries to load, or privileges to do those things, etc.
Yes. This is a big headache in graphics, where there are big tables indexed by texture index down at the GPU level. Managing those as the content changes, with the GPU using the data while the CPU alters it, leads to huge complexity in Vulkan graphics.
The two times I've had to deal with a really tough debugging problem in Rust, it's been related to some crate which did that.
And as a bonus, they also conveniently opt you out of any other hardening features your system malloc might have implemented (like MTE).
I certainly appreciate the performance benefits of custom allocators but it's disheartening to see people naively adopting them just as the world's allocators are starting to get their acts together with regards to security.
The one in the standard library is non-intrusive and allocates space on the heap per node. In situations where a linked list implementation, in Rust code, no less, is deemed necessary, alternatives should probably be used.
The only Rust programs I've written are simple toys, and I don't know enough computer science to fully understand the problem space here. However, the first thing that comes to mind is, what if you just used unsafe for your doubly linked list?
I understand this defeats the point of using Rust. But if the argument is "I'm using C because I actually need doubly linked lists and Rust makes that hard/slow", well C is also unsafe, so using Rust with an unsafe block for your doubly linked list seems like an improvement.
I feel like at least part of the reason not to do this is "because the Rust community will yell at you." Well, maybe they shouldn't do that? If you were using C no one would care, so switching to unsafe Rust shouldn't imply the sky is falling.
Unsafe is quite plainly the right tool for the job. Yes, people can mess up unsafe code, but that really is just how things are (until we get magic formally verifying compilers?!). I appreciate your levelheaded take, rather than advocating for only C or exaggerated concern for safe Rust.
Why would the rust community “yell at you” for using unsafe somewhere where it’s clearly needed?
By the way, unsafe doesn’t defeat the purpose of rust. The entire point of rust is that it lets you build safe abstractions on top of unsafe code, not that it eliminates unsafe code entirely.
I'd like to see something powerful for safety and speed in Crates.io and Cargo:
The ability for crates to specify whether they use certain features of Rust:
- unsafe
- arena allocation or other future "useful" stuff
- proc_macros, eg. serde (a huge compile speed hit)
- deep transitive dependency (total #, or depth)
- some proxy for the "number of compiler steps" (compiler workload complexity)
We should be able to set a crate-wide or workspace-wide policy that can ban other crates that violate these rules.
We could make this hand-written early on, but eventually enforced by the compiler. You could then easily guarantee that no downstream dependency uses unsafe code, has a crazy number of dependencies, or has crazy compile times.
You could opt-in or opt-out to these things without having to think too much about them or constantly (manually) be on the lookout.
This would be hugely beneficial to the Rust ecosystem, safety, code quality, and developer productivity / happiness.
Perhaps what would be useful is crates.io determining through analysis which in the set of all lint(s) a crate is compatible with and exposing that as automatic metadata?
Insightful article. As they say doubly linked lists are approximately useless and never used. But trees were nodes can refer to their parents are extremely common.
And I agree, a custom arena is usually the best option here. Even though it seems like "giving up" and going back to C, it's still way better.
Some of them have keys that can't be reused too, so you can avoid "use after free" style bugs (at runtime) which is way better than C pointer wrangling.
It's not perfect though. I kind of wish Rust did have some kind of support for this stuff natively. Pretty low on my list of Rust desires though - way behind better compile time and fewer async footguns.
Leaf pages in InnoDB’s B+tree implementation is the big one I can think of. Turns out being able to quickly walk pages for a range scan is helpful.
I agree that there isn’t wide variety of applications where they jump out as being the best choice. So, yeah, “approximately useless” sounds about right.
This is the back-link problem in Rust. You can do back links with weak pointers, but it's rather verbose and involves extra run-time checks.
I've been trying to figure out a compile-time approach to that problem. Here's an early version of a tech note on that.[1] It looks possible, but there are many issues. But it's important to look at now. The C++ people are trying to figure out safe backlinks.
[1] https://github.com/John-Nagle/technotes/blob/main/docs/rust/...
My gut says the following, so take with a grain of salt:
If your goal is to not do refcounting, You can tie compile time guaranteed access on a 'parent getter' (but only ever as a non-mut reference) by saying:
- parent element MUST be present at time of construction
- When doing a 'parent get' call, you MUST still prove you're in the parent's lifetime.
This can be done at compile time at 0 runtime cost. The problem is, that these constraints drastically reduce the usability of those back-links.
There is no compiler solution in Rust. Rust lifetimes form a DAG of (overlapping) ranges. (in terms that some leaf nodes are inductive proofs)
To relax those 2 constraints require a bi-directional graph, and I'm not sure if that can ever be computed in a reasonable time.
There are two main cases:
- Single ownership with back references that never outlive the owning reference
This requires a construct that manages the forward and back references together, preserving the invariant that A->B->A. It's also necessary to insure that no non-weak back reference from B to A exists when A is dropped.
- True multiple ownership with weak back references
This is the hard case. It might be solveable for complicated situations. See the example in my tech note that links to a Rust Playground file.[1] That's not a contrived example; it's code in actual use. It builds a set where all elements of the set can find the set itself. If two sets are merged (union), all elements move to one set struct, and the other now-empty set struct is discarded. All the back references are maintained during this operation.
Most back references seem to fall into one of those two cases. In fact, the second includes the first. The first is so common that it's worth handling separately.
The core problem is finding an approach which is 1) powerful enough to be useful on real code, and 2) not too hard to check at compile time. The second case requires some restrictions on coding style.
The preferred coding style is short borrows with very narrow scope, typically within one statement. That's normally considered bad, because it means extra time checking for double borrows. But if this kind of borrow can be made compile-time, it's free. The compiler is going to have to check for overlapping borrow scopes, and the more local the borrow scope, the less compile time work is involved. Storing a borrowed link in a structure or a static requires too much compile-time analysis, but if the lifetime of the borrow is very local, the analysis isn't hard. Borrow scopes which include a function call imply the need for cross-function call tree analysis. It's been pointed out to me that traits and generics make this especially hard, because you don't know what the instantiation can do before generics are instantiated. That might require some kind of annotation that says to the compiler "Implementations of function foo for trait Foobar are not allowed to do an upgrade of an Rc<Bar>". Then, at the call, the compiler can tell it's OK for a borrow scope to include that call, and in an implementation of the called function, upgrade of Rc<Bar> is prohibited. In fact, if this is done for all functions that receive an Rc<Bar> parameter, it might be possible to do all this without call tree analysis. It's becomes a type constraint. Need to talk to the type theorists, of which I am not one.
All those .borrow() calls are annoying. It might be possible to elide them most of the time, in the same way that you can refer to the item within an Rc without explicitly de-referencing the Rc. That would be nice from an ergonomic perspective. But it runs the risk of generating confusing compiler messages when a problem is detected. It would be like those confusing error messages that arise when type inference can't figure something out or the borrow checker needs explicit lifetime info to resolve an ambiguity.
[1] https://play.rust-lang.org/?version=stable&mode=debug&editio...
It's pretty easy to do at runtime without weak pointers, people who write rust are just allergic to reference counting. Which is absurd, because most programmer use RC values thousands of times every day without even thinking about it. It's really this easy: https://pastebin.com/SAeKG7Sw
C++
> The last time I used [a doubly linked list] was in the nineties. I know the Linux kernel uses them, but that design was also laid down in the nineties; if you were designing a kernel from scratch today, you would probably not do it that way.
This and sibling comments about the supposed uselessness or outdatedness of linked lists taught me something about the disconnect between many systems engineers resistant to Rust and many members of the (evangelical) Rust community.
To address the technical topic first: a doubly linked list is what you reach for when you’re looking for an intrusive data structure with fast append and fast delete. You need to iterate over it, but don’t need random access. Queues where you may need to remove elements. Think a list of completions that can be cancelled.
Why an intrusive data structure? Because you don’t want to, or can’t allocate. Because you want only a non-owning reference to the object, and you don’t want to have to manage any other lifetimes or dynamic (potentially unbounded, or arbitrarily bounded) allocations. Intrusive data structures are a great tool to minimize allocations, and to keep allocations (and ownership) logically separated across modules.
And now onto the cultural topic. That a member of the Rust community might not know this (which is of course totally fine) is what taught me something, which was maybe obvious to many: Rust brings safety to a C++ style language and audience. For the same reasons many dislike C++, many dislike Rust. For the same reasons many would opt for C over C++ when they have the choice (either across all their projects or for certain projects), many continue to opt for C over Rust.
Rust will not be taking over systems programming for the same reasons C++ did not. Well, by the numbers, it probably did (and Rust probably will too), so maybe better to say that it will not stamp out C for systems programming for the same reasons C++ did not. And it’s not just because of the stubbornness of C programmers.
Systems programming has 2 cultures and practice groups: the C camp, and the C++ camp. The C++ camp is a big tent. I’d argue it includes Rust and Swift (and maybe D). Zig belongs to the C camp.
Safety is important, and maybe Rust has the right model for it, but it is not a solution for the C camp. It likely did not intend to be, and again, by the numbers, replacing C++ is probably the more important task.
To clarify a couple of things:
I'm aware of the reasons Linux uses doubly linked lists. I'm still of the opinion this design decision made a lot of sense in the 1990s, and would make less sense in a new project today. You may disagree, and that's fine! Engineering is constrained by hard truths, but within those constraints, is full of judgment calls.
I'm not a member of the evangelical Rust community. I'm not proclaiming 'thou shalt rewrite it all in Rust'. Maybe you have good reasons to use something else. That's fine.
But if you are considering Rust, and are concerned about its ability to handle cyclic data structures, there are ways to do it, that don't involve throwing out all the benefits the language offers.
Linux's use of pointer-heavy datastructures is largely justified even today. The low-level kernel bookkeeping requirements rule out many typical suggestions. Modern hardware suggests certain patterns in broad strokes, which are sound for mostly any software. But the kernel has the unique role of multiplexing the hardware, instead of just running on the hardware. It is less concerned with, e.g., how to crunch numbers quickly than it is with how to facilitate other software crunching numbers quickly. As someone else noted elsewhere in the top-level thread, unsafe Rust is the appropriate compromise for things that must be done but aren't suitable for safe Rust. Unsafe Rust is one of the best realizations of engineering tradeoffs that I've seen, and neither C nor safe Rust justify its absence. Rust is only a "systems" language with unsafe, and that doesn't drag Rust down but rather strengthens it. "Here is the ugliness of the world, and I will not shy from it."
Beautifully put, poetic even. The only problem is that, in (*actual).reality, unsafe Rust is difficult to read, difficult to write, difficult to maintain, difficult to learn, has zero features for ergonomics, and has poorly defined rules.
C and Zig have a big advantage for writing maintainable, easy to read unsafe code over unsafe Rust.
I agree that unsafe Rust is not comfortable or simple, but I think it is usable when appropriately applied. It should very scarcely be used as a performance optimization. The Rust devs are quite invested in minimizing unsoundness in practice, particularly with Miri. In coming years, I expect unsafe to be further improved. Over the entire ecosystem, Rust with judicious use of unsafe is, in my opinion, vastly superior to C, unless the C is developed stringently, which is rare.
So, as an example, you'd be happily spending an extra allocation and extra pointers of space for each item in a list, even when that item type itself is only a couple of bytes, and you potentially need many millions of that type? Just so your design is not "from the nineties"?
An extrusive list needs at least 1 more pointer (to the item, separate from the link node), and possibly an additional backpointer to the link node when you want to unlink that node. It also adds allocation overhead and cache misses.
Intrusive lists are one of the few essential tools to achieve performance and low latency.
Or were you thinking of dynamically reallocating vectors? They are not an alternative, they are almost completely unusable in hardcore systems programming. Reallocating destroys pointer stability and adds latency, both very bad for concurrency.
I’m sorry, I did not intend to accuse you of being part of the evangelical community. Your article only prompted the thought I shared.
On the technical point, I think I do disagree, but open to changing my mind. What would be better? I’m working on an async runtime currently, written in C, and I’m using several intrusive doubly linked lists because of their properties I mentioned.
No problem!
As to what would be better - this is also a reply to your sibling comments above - I don't have a single across-the-board solution; the equivalent of std::vector everywhere is fine for some kinds of application code, but not necessarily for system code. Instead, I would start by asking questions.
What kinds of entities are you dealing with, what kinds of collections, and, critically, how many entities along each dimension, to an order of magnitude, p50 and p99? What are your typical access patterns? What are your use cases, so that you can figure out what figures of merit to optimize for? How unpredictable will be the adding of more use cases in the future?
In most kinds of application code, it's okay to just go for big-O, but for performance critical system code, you also need to care about constant factors. As an intuition primer, how many bytes can you memcpy in the time it takes for one cache miss? If your intuition for performance was trained in the eighties and nineties, as mine initially was, the answer may be larger than you expect.
While Rust is certainly competing against C++ moreso than C, I think Rust has qualities that make it suitable to rewrite old C programs or write new programs that would be written in C. Drawbacks of Rust such as excessive dependency usage, simply feeling too complex to write and read, overuse of the type system, or the subpar low-level hardware correspondence, are not huge issues in the right subculture. I think Rust is not quite C but offers a distinct path to do what C does. In any case, legacy C programs do need a pick-me-up, probably many from different technologies, and I think Zig is at most a small factor.
Rust won’t even succeed at replacing C++. There are technical and cultural issues at play. C++ has evolved a lot and still does some things better than Rust (dynamic linking, for example). Rust will be another popular systems programming language. There will be better ones. People will take their picks.
I find C++ friendlier for small hobby projects because it lets you get your code compiled even if it's a little messy or slightly unsafe.
As for "serious code", I admit that Rust is maybe better for low-level security-critical stuff. But for native applications it's on par thanks to smart pointers and also just -fsanitize=address.
(also default copy semantics just seem more intuitive i dunno why)
> To address the technical topic first: a doubly linked list is what you reach for when you’re looking for an intrusive data structure with fast append and fast delete. You need to iterate over it, but don’t need random access. Queues where you may need to remove elements. Think a list of completions that can be cancelled. > > Why an intrusive data structure? Because you don’t want to, or can’t allocate. Because you want only a non-owning reference to the object, and you don’t want to have to manage any other lifetimes or dynamic (potentially unbounded, or arbitrarily bounded) allocations. Intrusive data structures are a great tool to minimize allocations, and to keep allocations (and ownership) logically separated across modules.
If you really wanted this in Rust, you could probably get away with just initialising a Vec::with_capacity(), then have each element’s next and prev be indexes into the vec.
That would be the “arbitrarily bounded” allocation. Arbitrary because now you have to make a decision about how many items you’re willing to maintain despite that number being logically determined by a sum over an unknown set of modules.
Vec size is dynamic, so no.
Now you’re allocating in an ISR?
> Handles are deterministic. If a bug made your program crash on the last run, it will crash the same way on this run.
> Given that the arena will still be subject to array bounds checking, handle bugs won't allow an attacker to overwrite arbitrary memory the way pointer bugs do.
Just because you're in-bounds of the arena, doesn't mean you're not stomping on in-bounds neighboring data -- which can introduce "nondeterminism" as the author defines it. These are hand-waving arguments.
> doesn't mean you're not stomping on in-bounds neighboring data
I don't see how you would do that in safe rust though. The point of the arena would just be to solve the lifetime issue. It just moves the ownership problem one level higher. Instead of having circular ownership in a linked list, all linked list nodes are owned by the arena and hence have the same lifetime as the arena.
It's not just linked lists. From the article:
> The next step of course is to write your own equivalents of malloc and free to allocate and deallocate objects within the arena.
A logic bug in that "allocator" - which are plain indices and not covered by borrow checking - could 100% stomp memory.
Anticipating pushback: yes, you can disallow "pointer arithmetic" on handles, and store fingerprints in the "slots" to ensure they still contain the handle's identity to detect user-after-free, but congrats, you've implemented sparse sets, which there are dozen's of C++ implementations of with the same safety guarantees, so it's unclear what rust is bringing in that case (e.g. https://github.com/skypjack/entt/blob/master/src/entt/entity...)
That C++ already has many implementations of sparse sets seems to be a point in favor of sparse sets rather than a point against Rust, especially given that C++ doesn't need them the same way Rust does.
Well, there's several implementations because it's a bit of a leaky abstraction, and implementation details matter/vary across use-cases, which is consistent w/ my experience of rust having heavy fragmentation in applying the array-index pattern for "weak pointers."
If only devs had any other reason to use Rust.....
The topic is "Arenas in Rust" not "Arenas in Rust or anything other tradeoff."
Indeed, the topic is "in Rust", so you'd do arenas in Rust due to those memory safety benefits outside of arenas. And since arenas aren't as bad due determinism etc, it's still a net benefit: safer outside, safer inside
stomping on in-bounds data might result in "nondeterminism", but bounded and not ub, which is unbounded.
Given Rust’s memory safety goal, IMO the proper way to solve the circular reference problem in the context of Rust would be in the language. It should have a way to express what cycles may exist, maybe also how the code will break them when needed.
Have people been thinking about that?
There haven't been any serious proposals yet, but some ideas have been floating around from time to time. For example this blog post has some ideas that might allow them https://smallcultfollowing.com/babysteps/blog/2024/03/04/bor...
The hardest problems I think is that this requires a fundamental change to the language that something might be already borrowed just by existing. It also likely needs to introduce a notion of "stable" places that will not move when you move a prefix of them, e.g. to allow moving a `Box` holding self-referential data. In other words, there's a lot of bikeshedding to do.
Here's an implementation of a doubly-linked list which is perfectly fine on modern hardware: an array of structs where each struct contains a pointer to its next and previous elements in addition to whatever other data is being stored.
Here's a situation where a traditional pointer-based double-linked list is fine: when the payload is very large (e.g., an entire document in some app).
> The last time I used [a doubly linked list] was in the nineties. I know the Linux kernel uses them, but that design was also laid down in the nineties; if you were designing a kernel from scratch today, you would probably not do it that way.
Curious to know in the Linux context whether moves away from doubly linked lists have ever been tested?
Linux uses them because it has a lot of lists of objects that very rarely or never need to be iterated over, but where it needs to be fast to
- Insert a new item
- Delete a specific item (that you have a pointer to
- Move a specific item from list A to list B
- Get the next item to work on
And where pointers to an item need to be stable. Doubly-linked lists are very fast for all of that; their main downside is they are slow to iterate over due to cache incoherency but that doesn't matter if you very rarely or never iterate through the entire list. And since you need to have stable pointers to items in the list that don't change when you insert/remove items, most other data structures would need an extra level of indirection so you lose the cache coherency benefit anyways.
And they also value for the size of the set to not be bounded, and for those operations to occur without allocation.
> There are several ways to solve this problem. One way is to avoid using direct references to the particular class of objects at all. Instead, allocate a big array of objects, and refer to them with integer indexes into that array. There are several names that have been used for such arrays and indexes; let's call them arenas and handles.
I thought the term "Arena" referred to linear allocators, but maybe it's not so narrowly defined.
> At one level, this question is not very fair, because an answer to it as stated would be that one simply does not use doubly linked lists. They have been popular in introductory computer science lectures because they are a neat way to explain pointers in a data structure that's easy to draw on a whiteboard, but they are not a good match to modern hardware. The last time I used one was in the nineties. I know the Linux kernel uses them, but that design was also laid down in the nineties; if you were designing a kernel from scratch today, you would probably not do it that way.
The arena data structure you described is inefficient unless it uses a linked list to track empty slots. All general-purpose heap allocators use linked lists in their implementations. Linked lists show up wherever you want pointer stability, low fragmentation, or a way to decouple the storage of objects from their participation in data structures. I struggle to imagine how it would be possible to implement something like Linux without at least one linked list in it.
> Essentially you are bypassing the notion of pointers provided directly by the hardware
Pointers aren't a hardware concept, they're a language concept. Array indices and pointers both rely on indirect addressing, which is the underlying hardware concept. The "handles" strategy in Rust feels like a kludgy approach not because it bypasses the hardware but because Rust's borrow checker isn't actually involved in ensuring safety in that case, just its bounds checking and mandatory initialization (and if all you need is those two things then...).
I'm a bit agnostic about the specific solution these days. In general, early binding(so, static memory and types, formalized arenas and handles, in-line linear logic with few or no loops or branches) debugs more readily than late(dynamic memory allocs and types, raw pointers, virtual functions, runtime configuration). The appeal of late binding is in deferring the final computation to later so that your options stay open, while the converse is true with early binding - if you can write a program that always returns a precomputed answer, that's easier to grasp and verify.
When we compare one method of early binding with another, it's probably going to be a comparison of the granularity. Arenas are "stupid simple" - they partition out memory into chunks that limit the blast radius of error, but you still make the error. Ownership logic is a "picky generalization" - it lets you be more intricate and gain some assurances if you put up with the procedures necessary, but it starts to inhibit certain uses because its view into usage is too narrow with too many corner cases.
If we take Go's philosophy as an example of "what do you do if you want idioms for a really scalable codebase" - though you can certainly argue that it didn't work - it's that you usually want to lean on the stupid simple stuff and customize what you need for the rest. You don't opt into a picky Swiss Army Knife unless there's a specific problem to solve with it. Larger codebases have proportionately smaller sections that demand intricacy because more and more of the code is going to itself be a method of defining late binding, of configuring and deferring parts of processing.
That said, Rust lets you fall back to "stupid simple", it just doesn't pave the way to make that the default.
I wish Rust would finalize its custom memory allocators api (allocator_api) and put it in the stdlib as soon as possible. That feature has been unstable for a decade now.
I'd really like them not to hurry (or at least not "ASAP"). Among features that Rust should get right, this is definitely one of them.
And from its tracking issue [1], it seems there's alternatives with really important trade-offs (especially the very interesting storage API alternative), and it'd be really sad if we got stuck with something that ends up being unusable in many cases people actually want to use them on.
[1] https://github.com/rust-lang/rust/issues/32838
Fully agree with you. The feature needs to be done right, and an overhaul like the storage proposal is probably necessary. I think the allocator feature is one of those features that Rust historically had leeway on, because it's still relatively new, but now a bright path for the years (decades) ahead is crucial. Unstable-but-implemented-and-maybe-stabilizing is still better than stable-but-sucks-forever.
I kinda think rust is a dead-end. The learning curve is really challenging for most people, and you gain all these new constraints.
It competes against languages that give you memory safety by having garbage collection, or I guess even by using arenas in a few cases. GC comes with a performance hit, but a bit of a performance hit is usually much easier to deal with than having a smaller pool of effective developers
Worse is better
Nah. It's like learning vim, you get over the first shock and then you have a lifetime friend.
And another thing is with current coding agents, like the gpt-5-codex, Rust really shines here. The agent can easily guide you through the first borrow checker issues, and the compiler helps the agent to write good code.
Source: been writing Rust for a decade now. It is easily the best language I've ever used, still.
I'm experiencing that shock now. Can't create a global instance of my self-made heap datastructure. I'm longing for C memory management at the moment - it seems a tiny price to pay.
Now I learn that linked lists aren't possible and one has to use some, frankly, bullshit scheme to pretend to have them. IMO the documentation isn't really preparing people for the new reality of what can be done and how to do things.
Linked lists aren’t just possible in Rust, they are in the standard library.
They are just an example of the borrow checker’s limitations, which mean that you can only form a DAG of mutable references. But that’s why we have `unsafe`.
The way I think about rust is this: you can experience the pain up front while you’re getting your program to compile, or you can experience pain later during runtime in random unpredictable ways. Rust is the former. It takes time to learn how to reduce the up front pain, but I’m so much more confident in the end result.
I say this having years of coding background in languages like C, C#, Java, JavaScript, Python, etc.
It's fair enough but if one isn't writing multi-threaded code or code with some complicated ownership scheme then that upfront investment may never really deliver a return.
Not true. Whole classes of bugs are just eliminated - most importantly NPEs. I’ve seen tiny scripts in node that crash due to lack of an exception handler, and I’ve witnessed countless runtime errors in Java in the simplest of web services.
I've written utility programs that never bother to deallocate memory because the program needs all of it until the point where it exits anyhow.
You can do the same in Rust with Box::leak, which takes an owned pointer and gives you back a 'static borrow. The only caveat is that unless you reconstitute the type its Drop impl won't be called, but that's likely exactly what you wanted.
That’s certainly possible but I think it’s increasingly untrue for all but the smallest programs. The problem is that it’s hard to appreciate how many times your program didn’t crash but would have if you’d used a different language. You don’t need to be very complex before you have the risk of something crashing due to an edge case around values which are almost never null, even in languages like Python.
My productivity level in C is medium. In rust I'm competely blocked because I want to create a global variable and absolutely none of the described ways in the docs or stackoverflow work for my example. Should I give up on this "bad idea".....well I cannot because I can't accept not understanding why this is not working when people say it can.
Compared to C the situation is outlandishly, even hellishly impossible to understand. If I can't understand this one thing then I feel there's no point in continuing so I must stay and battle it till I get it or give up completely. I don't think I've ever hit anything like this in any other language I've ever learned.
The counterpoint I'd have for this argument is that code that isn't multithreaded or with complicated ownership scheme ends up being very simple looking Rust, so the difficulty of the language doesn't come into play.
https://docs.rs/once_cell/latest/once_cell/#lazy-initialized... ?
It’s even in the standard library now: https://doc.rust-lang.org/std/cell/struct.OnceCell.html
Have you read https://rust-unofficial.github.io/too-many-lists/?
Rust competes with both GC languages (offering way superior performance) and with C++ (offering safety and modern language features).
It’s a pretty strong value proposition, but it doesn’t mean there aren’t good reasons to pick a GC language if you can afford the performance hit.
The list of reasons to pick C++ over Rust is very short, mostly momentum, recruitment, and legacy. For greenfield projects, the list is even shorter.
The general experience is that developing software in Rust is slightly more expensive compared with, say, C#, and way, way cheaper than C++.
I get the sense that Rust is probably a much better replacement for C++ than C, and that Rust discussions frequently devolve because C programmers and C++ programmers are talking past each other, without understanding the divide.
I strongly disagree.
* The performance difference can be pretty massive depending on scenarios. Sure, most programs don't need it, but still * Rust has not only memory safety but full safety around it. For instance, in Java if you iterate over a list while mutating it you'll get a runtime error. In Rust, it won't compile, preventing you from a possible bug later on * The time you spend learning Rust is so much time saved afterwards because the language is specifically designed to reduce the risk of programming errors. Many things like the absence of a `null` value like we have in Java is an immense time saver when your program gets bigger
Why do you take it for granted that I can't just iterate and mutate and it works just fine?
> The time you spend learning Rust is so much time saved afterwards because the language is specifically designed to reduce the risk of programming errors
[Citation needed]
> It competes against languages that give you memory safety by having garbage collection
I wouldn’t use it where I could get away with garbage collection (nor C/C++) but that still leaves enough space for Rust to exist in.
I guess I just very rarely find problems where GC isn't an option. Even the ones where people say "this has to be fast so we can't use a GC" usually work well with a GC.
languages with a gc are using rust to deal with their horrific performance. look at python and javascript.
Rust is much easier to learn and use than C++, its main competitor.
> Doesn't that mean you are throwing out all the memory safety properties you were hoping to achieve by using Rust in the first place? ...That argument sounds logical, but it's not actually correct.
Actually, it is correct. The more generally you use "arenas" the more they are subject to the same kinds of issues as other manual memory management systems. "arenas" can only avoid/reduce the "nondeterminism" and "security" issues of general manual memory management on top of a buffer of bytes by not becoming general manual memory management on top of a buffer of bytes.
It all comes down to whether pointers into the arena do something different than normal pointers when they are dangling.
Normal pointers cause UB. That’s astronomically bad. If your arena pointers crash your program in a well-defined way, that’s already a much better situation.
The reason pointers are UB is that they could be anything including another object, code or memory mapped hardware interfaces. The analog here would be if you would just use the same index into a different arena, that trouble also wouldn't be bounded to just one arena.
How could using arenas lead to remote code execution?
You just need to put something executable, in whatever sense, in the arena... values that represent functions to call or dynamic libraries to load, or privileges to do those things, etc.
Arenas let you use OOBAs to do data attacks, and those can give you the moral equivalent of remote code execution.
Like if your arena contains both a buffer that OOB's and a buffer passed to a syscall, then having memory safety around the arena doesn't help you
Yes. This is a big headache in graphics, where there are big tables indexed by texture index down at the GPU level. Managing those as the content changes, with the GPU using the data while the CPU alters it, leads to huge complexity in Vulkan graphics.
The two times I've had to deal with a really tough debugging problem in Rust, it's been related to some crate which did that.
And as a bonus, they also conveniently opt you out of any other hardening features your system malloc might have implemented (like MTE).
I certainly appreciate the performance benefits of custom allocators but it's disheartening to see people naively adopting them just as the world's allocators are starting to get their acts together with regards to security.
I am really confused here.
1. Implementing a linked list is a one and done deal and should be included in a (standard) library.
2. Would a classic pointer based implementation of a linked list in C really be a cause of security vulnerability??
yea it's already in the standard library and it's very straightforward
The one in the standard library is non-intrusive and allocates space on the heap per node. In situations where a linked list implementation, in Rust code, no less, is deemed necessary, alternatives should probably be used.
The only Rust programs I've written are simple toys, and I don't know enough computer science to fully understand the problem space here. However, the first thing that comes to mind is, what if you just used unsafe for your doubly linked list?
I understand this defeats the point of using Rust. But if the argument is "I'm using C because I actually need doubly linked lists and Rust makes that hard/slow", well C is also unsafe, so using Rust with an unsafe block for your doubly linked list seems like an improvement.
I feel like at least part of the reason not to do this is "because the Rust community will yell at you." Well, maybe they shouldn't do that? If you were using C no one would care, so switching to unsafe Rust shouldn't imply the sky is falling.
Unsafe is quite plainly the right tool for the job. Yes, people can mess up unsafe code, but that really is just how things are (until we get magic formally verifying compilers?!). I appreciate your levelheaded take, rather than advocating for only C or exaggerated concern for safe Rust.
Why would the rust community “yell at you” for using unsafe somewhere where it’s clearly needed?
By the way, unsafe doesn’t defeat the purpose of rust. The entire point of rust is that it lets you build safe abstractions on top of unsafe code, not that it eliminates unsafe code entirely.
I'd like to see something powerful for safety and speed in Crates.io and Cargo:
The ability for crates to specify whether they use certain features of Rust:
- unsafe
- arena allocation or other future "useful" stuff
- proc_macros, eg. serde (a huge compile speed hit)
- deep transitive dependency (total #, or depth)
- some proxy for the "number of compiler steps" (compiler workload complexity)
We should be able to set a crate-wide or workspace-wide policy that can ban other crates that violate these rules.
We could make this hand-written early on, but eventually enforced by the compiler. You could then easily guarantee that no downstream dependency uses unsafe code, has a crazy number of dependencies, or has crazy compile times.
You could opt-in or opt-out to these things without having to think too much about them or constantly (manually) be on the lookout.
This would be hugely beneficial to the Rust ecosystem, safety, code quality, and developer productivity / happiness.
Many of these currently exist as lints that can live in a Cargo.toml:
https://doc.rust-lang.org/rustc/lints/index.htmlPerhaps what would be useful is crates.io determining through analysis which in the set of all lint(s) a crate is compatible with and exposing that as automatic metadata?
I think I will love FORTRAN again
Insightful article. As they say doubly linked lists are approximately useless and never used. But trees were nodes can refer to their parents are extremely common.
And I agree, a custom arena is usually the best option here. Even though it seems like "giving up" and going back to C, it's still way better.
There's this great comparison of Arenas in Rust: https://donsz.nl/blog/arenas/
Some of them have keys that can't be reused too, so you can avoid "use after free" style bugs (at runtime) which is way better than C pointer wrangling.
It's not perfect though. I kind of wish Rust did have some kind of support for this stuff natively. Pretty low on my list of Rust desires though - way behind better compile time and fewer async footguns.
> doubly-linked lists are approximately useless
Leaf pages in InnoDB’s B+tree implementation is the big one I can think of. Turns out being able to quickly walk pages for a range scan is helpful.
I agree that there isn’t wide variety of applications where they jump out as being the best choice. So, yeah, “approximately useless” sounds about right.