New ROCm™ 5.6 Release Brings Enhancements and Optimizations for AI and HPC Workloads

AlmightySnoo 🐢🇮🇱🇺🇦@lemmy.world · 1 year ago

New ROCm™ 5.6 Release Brings Enhancements and Optimizations for AI and HPC Workloads

dragontamer@lemmy.world · 1 year ago

The comparisons of ROCm and CUDA are inevitable, and AMD’s software support just doesn’t hold up.

That being said: ROCm is… sufficient? The API largely works, though the main issue is the narrow set of GPUs that actually work with ROCm. Software wise: the bulk of “important” CUDA features re replicated, but AMD has inevitably locked itself into a losing game of catchup… always having to react to new features from CUDA rather than leading with features of their own.

That being said, I don’t think that the “bulk” of GPU-code has really changed much in the last 10 years. Sure, the TensorCore / AI stuff and Raytracing added some stuff, but those are rather specialized operations. A typical GPU programmer doesn’t necessarily work with AI or Raytracing.

AMD GPUs are hugely impressive when you do work with them. Absolutely tons of VRAM on the cheap and huge amounts of TFLOPS to get things working. The software is a bit rougher but it does in fact work.

AlmightySnoo 🐢🇮🇱🇺🇦@lemmy.world · 1 year ago

A typical GPU programmer doesn’t necessarily work with AI

And even when you do, you’re going to find it infinitely more productive (and also performant!) to use OpenAI’s Triton, or something like Tiramisu or Halide to implement custom fused matrix multiplies or convolutions. I honestly believe CUDA as a distinctive advantage of Nvidia GPUs has plateaued here.

dragontamer@lemmy.world · 1 year ago

NVidia has a few nifty tricks still. Their sparse matrix multiply allows for a 4x4 matrix multiplication with half the space (assuming that half the matrix has zeros in them, which is common in AI). I don’t think AMD has a sparse FP16 4x4 matrix multiplication instruction yet.

AMD is behind in AI, but not significantly. AMD is ahead in double-precision / 64-bit compute, by a wide measure. AMD is first-blood on chiplets with MI200 as well, which puts them in a strong boat for future innovation.