rocm是amd推出的类NVIDIA CUDA的开源的开发平台。



CUDA ROCm Description
SM Compute Unit, CU One of many parallel vector processors in a GPU that contain parallel ALUs. All waves in a wrokgroup are assigned to the same CU.
Kernel Kernel Functions launched to the GPU that are executed by multiple parallel workers on the GPU. Kernels can work in parallel with CPU.
Warp Wavefront Collection of operations that execute in lockstep, run the same instructions, and follow the same control-flow path. Individual lanes can be masked off. Think of this as a vector thread. A 64-wide wavefront is a 64-wide vector op.
Thread Block Workgroup Group of wavefronts that are on the GPU at the same time. Can synchronize together and communicate through local memory.
Thread Work Item / Thread Individual lane in a wavefront. On AMD GPUs, mush run in lockstep with other work items in the wavefront. Lanes can be individually masked off.
GPU programming models can treat this as a separate thread of execution, though you do not necessarily get forward sub-wavefront progress.
subpartation of SM SIMD Both of them are 4 in SM/CU.

ROCm 目前不支持managed memory。

Scalar Unit && Scalar Registers (todo)

AMD ROCm Profiler

跟nvidia的ncu类似。但提供的hardware counters 比ncu的少很多。public的counters有:

需要用一个input file来指定需要的counters。



rdna white paper:

文章版权归 FindHao 所有丨本站默认采用CC-BY-NC-SA 4.0协议进行授权|
转载必须包含本声明,并以超链接形式注明作者 FindHao 和本文原始地址:


comments powered by Disqus