5 points | by rightnow_ai 12 hours ago ago
3 comments
Quick context on what this actually does
This is not static analysis. It runs your CUDA kernel in a CPU-backed simulator and predicts how it behaves on real GPUs
Basicly it uses a tile model tied to L2 size and SM limits
Right now it covers 80+ NVIDIA architectures and the Mean error on exec time is around 1–2% on our test kernals that we made 'more info in the blog'
It still struggles with dynamic parallelism but I will figure it out soon
Quick context on what this actually does
This is not static analysis. It runs your CUDA kernel in a CPU-backed simulator and predicts how it behaves on real GPUs
Basicly it uses a tile model tied to L2 size and SM limits
Right now it covers 80+ NVIDIA architectures and the Mean error on exec time is around 1–2% on our test kernals that we made 'more info in the blog'
It still struggles with dynamic parallelism but I will figure it out soon