Matrix Core Programming on AMD GPUs

(salykova.github.io)

90 points | by skidrow 16 hours ago ago

4 comments

gleenn 9 hours ago ago
Glad to see more articles out using AMD hardware acceleration especially for matrix math. More diversity in this space is welcome.
[-]
- latchkey 9 hours ago ago
  Many people have been asking them for this sort of content, and it is happening. Can't be more excited. Also note that it is AMD, but not AMD. Being published in the open on an individual github.
imtringued 3 hours ago ago
Whenever I see code like this, I'm starting to think that GPUs are uniquely unsuited for matrix multiplication.
You're pretending that each streaming multiprocessor can handle independent threads, when in reality you're feeding something that only exists once or twice per SM. It's like independently controlling one out of 32 cars on a 32 lane highway where the cars aren't allowed to switch lanes and having the controls on one car replicated to all the others when in reality everyone is sitting in the same bus.
[-]
- MaxBarraclough 2 hours ago ago
  I'm not sure I follow. Matrix multiplication isn't inherently 'branchy' in a way that we would expect to cause inefficient execution on SIMT (i.e. branch divergence).