Jupiter 2 – RVA23-compliant SBC features SpacemiT K3 octa-core RISC-V AI SoC

(cnx-software.com)

3 points | by emilio2601 11 hours ago ago

3 comments

emilio2601 11 hours ago ago
A couple notes - while the headline says it's an octa-core chip, it actually has 16 cores with different VLENs:
- 8× X100 cores @ 2.4 GHz (general-purpose, VLEN=256)
- 8× A100 cores @ 2.1 GHz ("AI-optimized", VLEN=1024)
By default, programs only run on the X100 cores, but there's a way to start processes on the A100 cores: https://github.com/sanderjo/SpacemiT-K3-X100-A100/blob/main/...
A more detailed write-up on the uarch can be found here (PDF auto downloads): https://forum.spacemit.com/uploads/short-url/60aJ8cYNmrFWqHn...
I believe this is the first RVA23-compliant RISC-V SBC on the market - excited to get my hands on one and test.
[-]
- brucehoult 7 hours ago ago
  See my top-level comment: https://news.ycombinator.com/item?id=46894034
brucehoult 8 hours ago ago
I've been using a K3 for a few weeks now. It's quite pleasant, and if I use all 16 cores (8x X100 and 8x A100) then it builds a Linux kernel almost 3x faster than my one year old Milk-V Megrez and almost 5x faster than K1.
Build Linux kernel 7503345ac5f5 defconfig
```
    14m25.56s  SpacemiT K3, 8 X100 cores + 8 A100 cores
    16m55.637s SpacemiT K3, 8 X100 cores @2.4 GHz
    19m12.787s i9-13900HX, 24C/32T @5.4 GHz, riscv64/ubuntu docker
    39m23.187s SpacemiT K3, 8 A100 cores @2.0 GHz
    42m12.414s Milk-V Megrez, 4 P550 cores @1.8 GHz
    67m35.189s VisionFive 2, 4 U74 cores @1.5 GHz
    70m57.001s LicheePi 3A, 8 X60 cores @1.6 GHz
```
Even using just the "AI" A100 cores is faster than the Megrez!
It's also great that it's now faster than a recent high end x86 with a lot of cores running QEMU.
The X100 cores are derived from THead's 2019 OpenC910. The A100 cores are derived from SpacemiT's own X60 cores in their K1/M1 SoC.
Note that the all-cores K3 result is running a distccd on each cluster, which adds quite a bit of overhead compared to a simple `make` on local cores. All the same it shaves 2.5 minutes off. In theory, doing Amdahl calculation on the X100 and A100 times, it might be possible to get close to 11m50s with a more efficient means of using heterogenous cores, but distcc was easy to do.
Or, you could just run independent things (e.g. different builds) on each set of 8 cores.
Or maybe there's a lower overhead way to use distcc, or something else that is set up to distribute work to more than one set of resources.
I've written a small (~40 instructions) statically linked pure asm program [1] that switches the process to the A100 cores [2] then EXECs the rest of the arguments.
So you can just type something like:
```
    ai bash
```
or
```
    ai gcc -O primes.c -o primes
```
or
```
    ai make -j8
```
... and that command (and any children) run safely on the A100 cores instead of the X100 cores.
It would be great if the upstream Linux kernel got official nicely-worked-out support for heterogenous cores -- more and more RISC-V is going to be like this, but Intel would also benefit with e.g. some cores having AVX-512 and some not, or even I recall one Arm (Samsung) SoC with big.LITTLE cores with different cache block sizes.
But in the meantime, this is workable and useful.
[1] so there is no possibility of the dynamic linker, C start code, or libc using the V extension and putting the process into a state dangerous to migrate to the different-VLEN cores.
[2] by getting the PID and writing it to to `/proc/set_ai_thread`