存储类型 | Turing/Volta延迟 | Pascal/Maxwell延迟 |
---|---|---|
Register | 6(No Bank Conflicts) | 6(No Bank Conflicts) |
Shared | 19(No Bank Conflicts) | 23(No Bank Conflicts) |
L1 Data | 32/28 | 82 |
L2 Data | ~188/~193 | ~234/~207 |
L1 Constant | ~26 | ~25 |
L2 Constant | ~215/~245 | ~236/~221 |
Local | ~1029 | ~1029 |
Global | ~1029 | ~1029 |
Reference
[1] Jia Z, Maggioni M, Smith J, et al. Dissecting the NVidia Turing T4 GPU via Microbenchmarking[J]. arXiv preprint arXiv:1903.07486, 2019.
[2] Yazdanbakhsh A, Park J, Sharma H, et al. Neural acceleration for gpu throughput processors[C]//Proceedings of the 48th International Symposium on Microarchitecture. ACM, 2015:
Comments