存储类型	Turing/Volta延迟	Pascal/Maxwell延迟
Register	6（No Bank Conflicts）	6（No Bank Conflicts）
Shared	19（No Bank Conflicts）	23（No Bank Conflicts）
L1 Data	32/28	82
L2 Data	~188/~193	~234/~207
L1 Constant	~26	~25
L2 Constant	~215/~245	~236/~221
Local	~1029	~1029
Global	~1029	~1029

Reference

[1] Jia Z, Maggioni M, Smith J, et al. Dissecting the NVidia Turing T4 GPU via Microbenchmarking[J]. arXiv preprint arXiv:1903.07486, 2019.

[2] Yazdanbakhsh A, Park J, Sharma H, et al. Neural acceleration for gpu throughput processors[C]//Proceedings of the 48th International Symposium on Microarchitecture. ACM, 2015:

NVIDIA存储架构速度

Reference

Comments

Reference

Related Posts:

Comments