存储类型 Turing/Volta延迟 Pascal/Maxwell延迟
Register 6(No Bank Conflicts) 6(No Bank Conflicts)
Shared 19(No Bank Conflicts) 23(No Bank Conflicts)
L1 Data 32/28 82
L2 Data ~188/~193 ~234/~207
L1 Constant ~26 ~25
L2 Constant ~215/~245 ~236/~221
Local ~1029 ~1029
Global ~1029 ~1029

Reference

[1] Jia Z, Maggioni M, Smith J, et al. Dissecting the NVidia Turing T4 GPU via Microbenchmarking[J]. arXiv preprint arXiv:1903.07486, 2019.

[2] Yazdanbakhsh A, Park J, Sharma H, et al. Neural acceleration for gpu throughput processors[C]//Proceedings of the 48th International Symposium on Microarchitecture. ACM, 2015:


文章版权归 FindHao 所有丨本站默认采用CC-BY-NC-SA 4.0协议进行授权|
转载必须包含本声明,并以超链接形式注明作者 FindHao 和本文原始地址:
https://findhao.net/easycoding/2473.html

Comments