Something not unlike this happened to me when moving some batch processing code from C++ to Python 1.4 (this was 1997). The batch started finishing about 10x faster. We refused to believe it at first and started looking to make sure the work was actually being done. It was.
The port had been done in a weekend just to see if we could use Python in production. The C++ code had taken a few months to write. The port was pretty direct, function for function. It was even line for line where language and library differences didn't offer an easier way.
A couple of us worked together for a day to find the reason for the speedup. Just looking at the code didn't give us any clues, so we started profiling both versions. We found out that the port had accidentally fixed a previously unknown bug in some code that built and compared cache keys. After identifying the small misbehaving function, we had to study the C++ code pretty hard to even understand what the problem was. I don't remember the exact nature of the bug, but I do remember thinking that particular type of bug would be hard to express in Python, and that's exactly why it was accidentally fixed.
We immediately started moving the rest of our back end to Python. Most things were slower, but not by much because most of our back end was i/o bound. We soon found out that we could make algorithmic improvements so much more quickly, so a lot of the slowest things got a lot faster than they had ever been. And, most importantly, we (the software developers) got quite a bit faster.
This was particularly true for one of the projects I've worked with in the past, where Python was chosen as the main language for a monitoring service.
In short, it proved itself to be a disaster: just the Python process collecting and parsing the metrics of all programs consumed 30-40% of the processing power of the lower end boxes.
In the end, the project went ahead for a while more, and we had to do all sorts of mitigations to get the performance impact to be less of an issue.
We did consider replacing it all by a few open source tools written in C and some glue code, the initial prototype used few MBs instead of dozens (or even hundreds) of MBs of memory, while barely registering any CPU load, but in the end it was deemed a waste of time when the whole project was terminated.
> After identifying the small misbehaving function, we had to study the C++ code pretty hard to even understand what the problem was. I don't remember the exact nature of the bug, but I do remember thinking that particular type of bug would be hard to express in Python, and that's exactly why it was accidentally fixed.
Pure speculation, but I would guess this has something to do with a copy constructor getting invoked in a place you wouldn't guess, that ends up in a critical path.
Given the context, I'm thinking bad cache keys resulting in spurious cache misses, where the keys are built in some low-level way. Cache misses almost certainly have a bigger asymptotic impact than extra copies, unless that copy constructor is really heavy.
I'm just remembering a performance issue I heard of eons ago where a sorting function comparison callback inadvertently allocated memory. It made sorting very slow. Someone said in a meeting that sorting was slow, and we all had a laugh about "shouldn't have used the bubble sort!" But it was the key comparison doing something stupid.
Agreed — the headline buries the lede. Algorithmic complexity improvements compound across all future inputs regardless of implementation language, while the WASM boundary win is more of a one-time gain. Worth noting that the statement-level caching insight generalises well: many parser-adjacent hot paths suffer the same O(N²) trap when doing repeated prefix/suffix matching without memoisation.
The real win here isn't TS over Rust, it's the O(N²) -> O(N) streaming fix via statement-level caching. That's a 3.3x improvement on its own, independent of language choice. The WASM boundary elimination is 2-4x, but the algorithmic fix is what actually matters for user-perceived latency during streaming. Title undersells the more interesting engineering imo.
O(N²) -> O(N) was 3.3x faster, but before that, eliminating the boundary (replacing wasm with JS) led to speedups of 2.2x, 4.6x, 3.0x (see one table back).
It looks like neither is the "real win". both the language and the algorithm made a big difference, as you can see in the first column in the last table - going to wasm was a big speedup, and improving the algorithm on top of that was another big speedup.
Yeah the algorithmic fix is doing most of the work here. But call that parser hundreds of times on tiny streaming chunks and the WASM boundary cost per call adds up fast. Same thing would happen with C++ compiled to WASM.
That's a pretty big claim. I don't doubt that a lot of uv's benefits are algo. But everything? Considering that running non IO-bound native code should be an order of magnitude faster than python.
Its a pretty well-supported claim. uv skips doing a number of things that generate file I/O. File I/O is far more costly than the difference in raw computation. pip can't drop those for compatibility reasons.
I don't think the article you linked supports the claim that none of UV performance improvements are related to using rust over python at all. In fact it directly states the exact opposite. They have an entire section dedicated to why using Rust has direct performance advantages for UV.
> uv is fast because of what it doesn’t do, not because of what language it’s written in. The standards work of PEP 518, 517, 621, and 658 made fast package management possible. Dropping eggs, pip.conf, and permissive parsing made it achievable. Rust makes it a bit faster still.
This is either an overly pedantic take or a disingenuous one. The very first line that the parent quoted is
> uv is fast because of what it doesn’t do, not because of what language it’s written in.
The fact that the language had a small effect ("a bit") does not invalidate the statement that algorithmic improvements are the reason for the relative speed. In fact, there's no reason to believe that rust without the algorithmic version would be notably faster at all. Sure, "all" is an exaggeration, but the point made still stands in the form that most readers would understand it: algorithmic improvements are the important difference between the systems.
I think we might be talking past each other a bit.
The specific claim I was responding to was that all of uv’s performance improvements come from algorithms rather than the language. My point was just that this is a stronger claim than what the article supports, the article itself says Rust contributes “a bit” to the speed, so it’s not purely algorithmic.
I do agree with the broader point that algorithmic and architectural choices are the main reason uv is fast, and I tried to acknowledge that, apparently unsuccessfully, in my very my first comment (“I don't doubt that a lot of uv's benefits are algo. But everything?”).
I don't think the article has substantive numbers. You'd have to re-implement UV in python to do that. I don't think anyone did that. It would be interesting at least to see how much UV spends in syscalls vs PIP and make a relative estimate based on that.
"We rewrote this code from language L to language M, and the result is better!" No wonder: it was a chance to rectify everything that was tangled or crooked, avoid every known bad decision, and apply newly-invented better approaches.
So this holds even for L = M. The speedup is not in the language, but in the rewriting and rethinking.
You're generally right - rewrites let you improve the code - but they do have an actual reason the new language was better: avoiding copies on the boundary.
They say they measured that cost, and it was most of the runtime in the old version (though they don't give exact numbers). That cost does not exist at all in the new version, simply because of the language.
I think that they were honest about that to a degree, they pointed out that one source of the speed up was caused by the python fixing a big they hadn't noticed in the C++
By the way, I did a deeper dive on the problem of serializing objects across the Rust/JS boundary, noticed the approach used by serde wasn’t great for performance, and explored improving it here: https://neugierig.org/software/blog/2024/04/rust-wasm-to-js....
I’m more of a dabbler dev/script guy than a dev but Every. single. thing I ever write in javascript ends up being incredibly fast. It forces me to think in callbacks and events and promises.
Python and C (or async!) seem easy and sorta lazy in comparison.
Yeah if you're serializing and deserializing data across the JS-WASM boundary (or actually between web workers in general whether they're WASM or not) the data marshaling costs can add up. There is a way of sharing memory across the boundary though without any marshaling: TypedArrays and SharedArrayBuffers. TypedArrays let you transfer ownership of the underlying memory from one worker (or the main thread) to another without any copying. SharedArrayBuffers allow multiple workers to read and write to the same contiguous chunk of memory. The downside is that you lose all the niceties of any JavaScript types and you're basically stuck working with raw bytes.
You still do get some latency from the event loop, because postMessage gets queued as a MacroTask, which is probably on the order of 10μs. But this is the price you have to pay if you want to run some code in a non-blocking way.
I heard a lot of similar stories in the past when I started using Python 20+ years ago. A number of people claimed their solutions got faster when develop in Python, mainly because Python make it easier to quickly pivot to experiment with various alternative methods, hence finally yield at more efficient outcome at the end.
> The openui-lang parser converts a custom DSL emitted by an LLM into a React component tree.
> converts internal AST into the public OutputNode format consumed by the React renderer
Why not just have the LLM emit the JSON for OutputNode ? Why is a custom "language" and parser needed at all? And yes, there is a cost for marshaling data, so you should avoid doing it where possible, and do it in large chunks when its not possible to avoid. This is not an unknown phenomenon.
Not directly related to the post but what does OpenUI do? I'm finding it interesting but hard to understand. Is it an intermediate layer that makes LLMs generate better UI?
That final summary benchmark means nothing. It mentions 'baseline' value for the 'Full-stream total' for the rust implementation, and then says the `serde-wasm-bindgen` is '+9-29% slower', but it never gives us the baseline value, because clearly the only benchmark it did against the Rust codebase was the per-call one.
Then it mentions:
"End result: 2.2-4.6x faster per call and 2.6-3.3x lower total streaming cost."
But the "2.6-3.3x" is by their own definition a comparison against the naive TS implementation.
I really think the guy just prompted claude to "get this shit fast and then publish a blog post".
This article is obviously AI generated and besides being jarring to read, it makes me really doubt its validity. You can get substantially faster parsing versus `JSON.parse()` by parsing structured binary data, and it's also faster to pass a byte array compared to a JSON string from wasm to the browser. My guess is not only this article was AI generated, but also their benchmarks, and perhaps the implementation as well.
The WASM story is interesting from a security angle too. WASM modules inheriting the host's memory model means any parsing bugs that trigger buffer overreads in the Rust code could surface in ways that are harder to audit at the JS boundary. Moving to native TS at least keeps the attack surface in one runtime, even if the theoretical memory safety guarantees go down.
They use a bespoke language to define LLM-generated UI components. I think that this is supposed to prevent exfiltration if the LLM is prompt-injected. In any case, the parser compiles chunks streaming from the LLM to build a live UI. The WASM parser restarted from the beginning upon each chunk received. Fixing this algorithm to work more incrementally (while porting from Rust to TypeScript) improved performance a lot.
I tried a similar experiment recently w/ FFT transform for wav files in the browser and javascript was faster than wasm. It was mostly vibe coded Rust to wasm but FFT is a well-known algorithm so I don't think there were any low hanging performance improvements left to pick.
> Attempted Fix: Skip the JSON Round-Trip
> We integrated serde-wasm-bindgen
So you're reinventing JSON but binary? V8 JSON nowadays is highly optimized [1] and can process gigabytes per second [2], I doubt it is a bottleneck here.
No, serde-wasm-bindgen implements the serde Serializer interface by calling into JS to directly construct the JS objects on the JS heap without an intermediate serialization/deserialization. You pay the cost of one or more FFI calls for every object though.
Hmm, there's an in-progress rewrite of the TypeScript compiler in Go; is that what you mean?
I don't think that's actually out yet, and more importantly, it doesn't change anything at runtime -- your code still runs in a JS engine (V8, JSC etc).
Something not unlike this happened to me when moving some batch processing code from C++ to Python 1.4 (this was 1997). The batch started finishing about 10x faster. We refused to believe it at first and started looking to make sure the work was actually being done. It was.
The port had been done in a weekend just to see if we could use Python in production. The C++ code had taken a few months to write. The port was pretty direct, function for function. It was even line for line where language and library differences didn't offer an easier way.
A couple of us worked together for a day to find the reason for the speedup. Just looking at the code didn't give us any clues, so we started profiling both versions. We found out that the port had accidentally fixed a previously unknown bug in some code that built and compared cache keys. After identifying the small misbehaving function, we had to study the C++ code pretty hard to even understand what the problem was. I don't remember the exact nature of the bug, but I do remember thinking that particular type of bug would be hard to express in Python, and that's exactly why it was accidentally fixed.
We immediately started moving the rest of our back end to Python. Most things were slower, but not by much because most of our back end was i/o bound. We soon found out that we could make algorithmic improvements so much more quickly, so a lot of the slowest things got a lot faster than they had ever been. And, most importantly, we (the software developers) got quite a bit faster.
My experience is the exact opposite.
This was particularly true for one of the projects I've worked with in the past, where Python was chosen as the main language for a monitoring service.
In short, it proved itself to be a disaster: just the Python process collecting and parsing the metrics of all programs consumed 30-40% of the processing power of the lower end boxes.
In the end, the project went ahead for a while more, and we had to do all sorts of mitigations to get the performance impact to be less of an issue.
We did consider replacing it all by a few open source tools written in C and some glue code, the initial prototype used few MBs instead of dozens (or even hundreds) of MBs of memory, while barely registering any CPU load, but in the end it was deemed a waste of time when the whole project was terminated.
> After identifying the small misbehaving function, we had to study the C++ code pretty hard to even understand what the problem was. I don't remember the exact nature of the bug, but I do remember thinking that particular type of bug would be hard to express in Python, and that's exactly why it was accidentally fixed.
Pure speculation, but I would guess this has something to do with a copy constructor getting invoked in a place you wouldn't guess, that ends up in a critical path.
Given the context, I'm thinking bad cache keys resulting in spurious cache misses, where the keys are built in some low-level way. Cache misses almost certainly have a bigger asymptotic impact than extra copies, unless that copy constructor is really heavy.
I'm just remembering a performance issue I heard of eons ago where a sorting function comparison callback inadvertently allocated memory. It made sorting very slow. Someone said in a meeting that sorting was slow, and we all had a laugh about "shouldn't have used the bubble sort!" But it was the key comparison doing something stupid.
good ol' shallow-vs-deep copy
Fun story! Performance is often highly unintuitive, and even counterintuitive (e.g. going from C++ to Python). Very much an art as well as a science.
Crazy how many stories like this I’ve heard of how doing performance work helped people uncover bugs and/or hidden assumptions about their systems.
Agreed — the headline buries the lede. Algorithmic complexity improvements compound across all future inputs regardless of implementation language, while the WASM boundary win is more of a one-time gain. Worth noting that the statement-level caching insight generalises well: many parser-adjacent hot paths suffer the same O(N²) trap when doing repeated prefix/suffix matching without memoisation.
The real win here isn't TS over Rust, it's the O(N²) -> O(N) streaming fix via statement-level caching. That's a 3.3x improvement on its own, independent of language choice. The WASM boundary elimination is 2-4x, but the algorithmic fix is what actually matters for user-perceived latency during streaming. Title undersells the more interesting engineering imo.
O(N²) -> O(N) was 3.3x faster, but before that, eliminating the boundary (replacing wasm with JS) led to speedups of 2.2x, 4.6x, 3.0x (see one table back).
It looks like neither is the "real win". both the language and the algorithm made a big difference, as you can see in the first column in the last table - going to wasm was a big speedup, and improving the algorithm on top of that was another big speedup.
Yeah the algorithmic fix is doing most of the work here. But call that parser hundreds of times on tiny streaming chunks and the WASM boundary cost per call adds up fast. Same thing would happen with C++ compiled to WASM.
You’re not wrong, but that win would not get as many views. It’s not clickbaity enough
same for uv but no one takes that message. They just think "rust rulez!" and ignore that all of uv's benefits are algo, not lang.
Some architectures are made easier by the choice of implementation language.
In my experience Rust typically makes it a little bit harder to write the most efficient algo actually.
That’s usually ok bc in most code your N is small and compiler optimizations dominate.
Would you be willing to give an example of this?
That's a pretty big claim. I don't doubt that a lot of uv's benefits are algo. But everything? Considering that running non IO-bound native code should be an order of magnitude faster than python.
Its a pretty well-supported claim. uv skips doing a number of things that generate file I/O. File I/O is far more costly than the difference in raw computation. pip can't drop those for compatibility reasons.
https://nesbitt.io/2025/12/26/how-uv-got-so-fast.html
I don't think the article you linked supports the claim that none of UV performance improvements are related to using rust over python at all. In fact it directly states the exact opposite. They have an entire section dedicated to why using Rust has direct performance advantages for UV.
What it says is this:
> uv is fast because of what it doesn’t do, not because of what language it’s written in. The standards work of PEP 518, 517, 621, and 658 made fast package management possible. Dropping eggs, pip.conf, and permissive parsing made it achievable. Rust makes it a bit faster still.
Yes exactly! That quote directly disproves that all of the improvements UV has over competitors is because of algos, not because of rust.
So the claim is not well supported at all by the article as you stated, in fact the claim is literally disproven by the article.
This is either an overly pedantic take or a disingenuous one. The very first line that the parent quoted is
> uv is fast because of what it doesn’t do, not because of what language it’s written in.
The fact that the language had a small effect ("a bit") does not invalidate the statement that algorithmic improvements are the reason for the relative speed. In fact, there's no reason to believe that rust without the algorithmic version would be notably faster at all. Sure, "all" is an exaggeration, but the point made still stands in the form that most readers would understand it: algorithmic improvements are the important difference between the systems.
I think we might be talking past each other a bit.
The specific claim I was responding to was that all of uv’s performance improvements come from algorithms rather than the language. My point was just that this is a stronger claim than what the article supports, the article itself says Rust contributes “a bit” to the speed, so it’s not purely algorithmic.
I do agree with the broader point that algorithmic and architectural choices are the main reason uv is fast, and I tried to acknowledge that, apparently unsuccessfully, in my very my first comment (“I don't doubt that a lot of uv's benefits are algo. But everything?”).
You are right. 99% is not 100%.
I don't think the article has substantive numbers. You'd have to re-implement UV in python to do that. I don't think anyone did that. It would be interesting at least to see how much UV spends in syscalls vs PIP and make a relative estimate based on that.
More than one, I'd think.
> Title undersells the more interesting engineering imo.
Thanks for cutting through the clickbait. The post is interesting, but I'm so tired of being unnecessarily clickbaited into reading articles.
Yeah, though the n^2 is overstating things.
One thing I noticed was that they time each call and then use a median. Sigh. In a browser. :/ With timing attack defenses build into the JS engine.
For those of us not in the know, what are we expecting the results of the defenses to be here?
Jitter. It make precise timings unreliable. Time the entire time of 1000 runs and divide by 1000 instead of starting and stopping 1000 timers.
More like a misleading clickbait.
"We rewrote this code from language L to language M, and the result is better!" No wonder: it was a chance to rectify everything that was tangled or crooked, avoid every known bad decision, and apply newly-invented better approaches.
So this holds even for L = M. The speedup is not in the language, but in the rewriting and rethinking.
Now they just need a third party who's never seen the original to rewrite their TypeScript solution in Rust for even more gains.
Indeed! But only after a year or so of using it in production, so that the drawbacks would be discovered.
You're generally right - rewrites let you improve the code - but they do have an actual reason the new language was better: avoiding copies on the boundary.
They say they measured that cost, and it was most of the runtime in the old version (though they don't give exact numbers). That cost does not exist at all in the new version, simply because of the language.
Truth. You can see improvement, even rewriting code in the same language.
I think that they were honest about that to a degree, they pointed out that one source of the speed up was caused by the python fixing a big they hadn't noticed in the C++
Edit: fixed phone typos
By the way, I did a deeper dive on the problem of serializing objects across the Rust/JS boundary, noticed the approach used by serde wasn’t great for performance, and explored improving it here: https://neugierig.org/software/blog/2024/04/rust-wasm-to-js....
Did you try something like msgpack or bebop?
I was wondering why I hadn't heard of Open UI doing anything with WASM.
This new company chose a very confusing name that has been used by the Open UI W3C Community Group for over 5 years.
https://open-ui.org/
Open UI is the standards group responsible for HTML having popovers, customizable select, invoker commands, and accordions. They're doing great work.
I’m more of a dabbler dev/script guy than a dev but Every. single. thing I ever write in javascript ends up being incredibly fast. It forces me to think in callbacks and events and promises. Python and C (or async!) seem easy and sorta lazy in comparison.
Yeah if you're serializing and deserializing data across the JS-WASM boundary (or actually between web workers in general whether they're WASM or not) the data marshaling costs can add up. There is a way of sharing memory across the boundary though without any marshaling: TypedArrays and SharedArrayBuffers. TypedArrays let you transfer ownership of the underlying memory from one worker (or the main thread) to another without any copying. SharedArrayBuffers allow multiple workers to read and write to the same contiguous chunk of memory. The downside is that you lose all the niceties of any JavaScript types and you're basically stuck working with raw bytes.
You still do get some latency from the event loop, because postMessage gets queued as a MacroTask, which is probably on the order of 10μs. But this is the price you have to pay if you want to run some code in a non-blocking way.
I heard a lot of similar stories in the past when I started using Python 20+ years ago. A number of people claimed their solutions got faster when develop in Python, mainly because Python make it easier to quickly pivot to experiment with various alternative methods, hence finally yield at more efficient outcome at the end.
> The openui-lang parser converts a custom DSL emitted by an LLM into a React component tree.
> converts internal AST into the public OutputNode format consumed by the React renderer
Why not just have the LLM emit the JSON for OutputNode ? Why is a custom "language" and parser needed at all? And yes, there is a cost for marshaling data, so you should avoid doing it where possible, and do it in large chunks when its not possible to avoid. This is not an unknown phenomenon.
Not directly related to the post but what does OpenUI do? I'm finding it interesting but hard to understand. Is it an intermediate layer that makes LLMs generate better UI?
God I hate AI writing.
That final summary benchmark means nothing. It mentions 'baseline' value for the 'Full-stream total' for the rust implementation, and then says the `serde-wasm-bindgen` is '+9-29% slower', but it never gives us the baseline value, because clearly the only benchmark it did against the Rust codebase was the per-call one.
Then it mentions: "End result: 2.2-4.6x faster per call and 2.6-3.3x lower total streaming cost."
But the "2.6-3.3x" is by their own definition a comparison against the naive TS implementation.
I really think the guy just prompted claude to "get this shit fast and then publish a blog post".
This article is obviously AI generated and besides being jarring to read, it makes me really doubt its validity. You can get substantially faster parsing versus `JSON.parse()` by parsing structured binary data, and it's also faster to pass a byte array compared to a JSON string from wasm to the browser. My guess is not only this article was AI generated, but also their benchmarks, and perhaps the implementation as well.
It's vibe code all the way down!
Why not a shared buffer? Serializing into JSON on this hot path should be entirely avoidable
I think a shared array just avoids the copy, not the serialization which is the main problem as they showed with serde-wasm-bindgen test
The WASM story is interesting from a security angle too. WASM modules inheriting the host's memory model means any parsing bugs that trigger buffer overreads in the Rust code could surface in ways that are harder to audit at the JS boundary. Moving to native TS at least keeps the attack surface in one runtime, even if the theoretical memory safety guarantees go down.
I dream of the day in which there is no need to pass by JS and Wasm can do all the job by itself. Meanwhile, we are stuck.
That blog post design is very nice. I like the 'scrollspy' sidebar which highlights all visible headings.
Claude tells me this is https://www.fumadocs.dev/
Interesting, thanks. I need make some good docs soon.
Good documentation is always worth the effort. Markdown explaining your products is gold these days with LLMs.
It would be great if people stopped dismissing the problem that WASM not being a first-class runtime for the web causes.
So this is an issue with WASM/JS interop, not with Rust per se?
Good software is usually written on 2nd+ try.
What is the purpose of the Rust WASM parser? Didn't understand that easily from the article. Would love a better explanation.
They use a bespoke language to define LLM-generated UI components. I think that this is supposed to prevent exfiltration if the LLM is prompt-injected. In any case, the parser compiles chunks streaming from the LLM to build a live UI. The WASM parser restarted from the beginning upon each chunk received. Fixing this algorithm to work more incrementally (while porting from Rust to TypeScript) improved performance a lot.
Rewrite bias. Yoy want to also rewrite the Rust one in Rust for comparison.
It would be surprising if rewriting in Rust could change the WASM boundary tax that the article identified as the actual problem.
I tried a similar experiment recently w/ FFT transform for wav files in the browser and javascript was faster than wasm. It was mostly vibe coded Rust to wasm but FFT is a well-known algorithm so I don't think there were any low hanging performance improvements left to pick.
This is very unusual statement :-D
> Attempted Fix: Skip the JSON Round-Trip > We integrated serde-wasm-bindgen
So you're reinventing JSON but binary? V8 JSON nowadays is highly optimized [1] and can process gigabytes per second [2], I doubt it is a bottleneck here.
[1] https://v8.dev/blog/json-stringify [2] https://github.com/simdjson/simdjson
No, serde-wasm-bindgen implements the serde Serializer interface by calling into JS to directly construct the JS objects on the JS heap without an intermediate serialization/deserialization. You pay the cost of one or more FFI calls for every object though.
https://docs.rs/serde-wasm-bindgen/
Am I mistaken or isn’t TypeScript just Golang under the hood these days?
There is too much wrong here to call it a mistake.
Hmm, there's an in-progress rewrite of the TypeScript compiler in Go; is that what you mean?
I don't think that's actually out yet, and more importantly, it doesn't change anything at runtime -- your code still runs in a JS engine (V8, JSC etc).
npm i -D @typescript/native-preview
You can use it today.
They should rewrite it in rust again to get another 3x performance increase /s