CUDA PTX手册的目录

为了便于搜索指令对应的章节,将ptx官方手册的目录提取出来。

ptx手册:http://docs.nvidia.com/cuda/parallel-thread-execution/index.html

CUDA Toolkit v9.1.85
PTX ISA
▽1. Introduction
1.1. Scalable Data-Parallel Computing using GPUs
1.2. Goals of PTX
1.3. PTX ISA Version 6.1
1.4. Document Structure
▽2. Programming Model
2.1. A Highly Multithreaded Coprocessor
▽2.2. Thread Hierarchy
2.2.1. Cooperative Thread Arrays
2.2.2. Grid of Cooperative Thread Arrays
2.3. Memory Hierarchy
▽3. PTX Machine Model
3.1. A Set of SIMT Multiprocessors with On-chip Shared Memory
▽4. Syntax
4.1. Source Format
4.2. Comments
▽4.3. Statements
4.3.1. Directive Statements
4.3.2. Instruction Statements
4.4. Identifiers
4.5. Constants
▷4.6. Integer Constants
4.6.1. Floating-Point Constants
4.6.2. Predicate Constants
4.6.3. Constant Expressions
4.6.4. Integer Constant Expression Evaluation
4.6.5. Summary of Constant Expression Evaluation Rules
▷5. State Spaces, Types, and Variables
▷5.1. State Spaces
5.1.1. Register State Space
5.1.2. Special Register State Space
▷5.1.3. Constant State Space
5.1.3.1. Banked Constant State Space (deprecated)
5.1.4. Global State Space
5.1.5. Local State Space
▷5.1.6. Parameter State Space
5.1.6.1. Kernel Function Parameters
5.1.6.2. Kernel Function Parameter Attributes
5.1.6.3. Kernel Parameter Attribute: .ptr
5.1.6.4. Device Function Parameters
5.1.7. Shared State Space
5.1.8. Texture State Space (deprecated)
▷5.2. Types
5.2.1. Fundamental Types
5.2.2. Restricted Use of Sub-Word Sizes
▷5.3. Texture Sampler and Surface Types
5.3.1. Texture and Surface Properties
5.3.2. Sampler Properties
5.3.3. Channel Data Type and Channel Order Fields
▷5.4. Variables
5.4.1. Variable Declarations
5.4.2. Vectors
5.4.3. Array Declarations
5.4.4. Initializers
5.4.5. Alignment
5.4.6. Parameterized Variable Names
5.4.7. Variable Attributes
5.4.8. Variable Attribute Directive: .attribute
▷6. Instruction Operands
6.1. Operand Type Information
6.2. Source Operands
6.3. Destination Operands
▷6.4. Using Addresses, Arrays, and Vectors
▷6.4.1. Addresses as Operands
6.4.1.1. Generic Addressing
6.4.2. Arrays as Operands
6.4.3. Vectors as Operands
6.4.4. Labels and Function Names as Operands
▷6.5. Type Conversion
6.5.1. Scalar Conversions
6.5.2. Rounding Modifiers
6.6. Operand Costs
▷7. Abstracting the ABI
▷7.1. Function Declarations and Definitions
7.1.1. Changes from PTX ISA Version 1.x
7.2. Variadic Functions
7.3. Alloca
▷8. Memory Consistency Model
▷8.1. Scope and applicability of the model
8.1.1. Limitations on atomicity at system scope
▷8.2. Memory operations
8.2.1. Overlap
8.2.2. Vector Data-types
8.2.3. Initialization
8.3. State spaces
8.4. Operation types
8.5. Scope
▷8.6. Morally strong operations
8.6.1. Conflict and Data-races
8.6.2. Limitations on Mixed-size Data-races
8.7. Release and Acquire Patterns
▷8.8. Ordering of memory operations
8.8.1. Program Order
8.8.2. Observation Order
8.8.3. Fence-SC Order
8.8.4. Memory synchronization
8.8.5. Causality Order
8.8.6. Coherence Order
8.8.7. Communication Order
▷8.9. Axioms
8.9.1. Coherence
8.9.2. Fence-SC
8.9.3. Atomicity
8.9.4. No Thin Air
8.9.5. Sequential Consistency Per Location
8.9.6. Causality
▽9. Instruction Set
9.1. Format and Semantics of Instruction Descriptions
9.2. PTX Instructions
▷9.3. Predicated Execution
▷9.3.1. Comparisons
9.3.1.1. Integer and Bit-Size Comparisons
9.3.1.2. Floating Point Comparisons
9.3.2. Manipulating Predicates
▷9.4. Type Information for Instructions and Operands
9.4.1. Operand Size Exceeding Instruction-Type Size
9.5. Divergence of Threads in Control Constructs
▷9.6. Semantics
9.6.1. Machine-Specific Semantics of 16-bit Code
▷9.7. Instructions
▷9.7.1. Integer Arithmetic Instructions
9.7.1.1. Integer Arithmetic Instructions: add
9.7.1.2. Integer Arithmetic Instructions: sub
9.7.1.3. Integer Arithmetic Instructions: mul
9.7.1.4. Integer Arithmetic Instructions: mad
9.7.1.5. Integer Arithmetic Instructions: mul24
9.7.1.6. Integer Arithmetic Instructions: mad24
9.7.1.7. Integer Arithmetic Instructions: sad
9.7.1.8. Integer Arithmetic Instructions: div
9.7.1.9. Integer Arithmetic Instructions: rem
9.7.1.10. Integer Arithmetic Instructions: abs
9.7.1.11. Integer Arithmetic Instructions: neg
9.7.1.12. Integer Arithmetic Instructions: min
9.7.1.13. Integer Arithmetic Instructions: max
9.7.1.14. Integer Arithmetic Instructions: popc
9.7.1.15. Integer Arithmetic Instructions: clz
9.7.1.16. Integer Arithmetic Instructions: bfind
9.7.1.17. Integer Arithmetic Instructions: fns
9.7.1.18. Integer Arithmetic Instructions: brev
9.7.1.19. Integer Arithmetic Instructions: bfe
9.7.1.20. Integer Arithmetic Instructions: bfi
9.7.1.21. Integer Arithmetic Instructions: dp4a
9.7.1.22. Integer Arithmetic Instructions: dp2a
▷9.7.2. Extended-Precision Integer Arithmetic Instructions
9.7.2.1. Extended-Precision Arithmetic Instructions: add.cc
9.7.2.2. Extended-Precision Arithmetic Instructions: addc
9.7.2.3. Extended-Precision Arithmetic Instructions: sub.cc
9.7.2.4. Extended-Precision Arithmetic Instructions: subc
9.7.2.5. Extended-Precision Arithmetic Instructions: mad.cc
9.7.2.6. Extended-Precision Arithmetic Instructions: madc
▷9.7.3. Floating-Point Instructions
9.7.3.1. Floating Point Instructions: testp
9.7.3.2. Floating Point Instructions: copysign
9.7.3.3. Floating Point Instructions: add
9.7.3.4. Floating Point Instructions: sub
9.7.3.5. Floating Point Instructions: mul
9.7.3.6. Floating Point Instructions: fma
9.7.3.7. Floating Point Instructions: mad
9.7.3.8. Floating Point Instructions: div
9.7.3.9. Floating Point Instructions: abs
9.7.3.10. Floating Point Instructions: neg
9.7.3.11. Floating Point Instructions: min
9.7.3.12. Floating Point Instructions: max
9.7.3.13. Floating Point Instructions: rcp
9.7.3.14. Floating Point Instructions: rcp.approx.ftz.f64
9.7.3.15. Floating Point Instructions: sqrt
9.7.3.16. Floating Point Instructions: rsqrt
9.7.3.17. Floating Point Instructions: rsqrt.approx.ftz.f64
9.7.3.18. Floating Point Instructions: sin
9.7.3.19. Floating Point Instructions: cos
9.7.3.20. Floating Point Instructions: lg2
9.7.3.21. Floating Point Instructions: ex2
▷9.7.4. Half Precision Floating-Point Instructions
9.7.4.1. Half Precision Floating Point Instructions: add
9.7.4.2. Half Precision Floating Point Instructions: sub
9.7.4.3. Half Precision Floating Point Instructions: mul
9.7.4.4. Half Precision Floating Point Instructions: fma
▷9.7.5. Comparison and Selection Instructions
9.7.5.1. Comparison and Selection Instructions: set
9.7.5.2. Comparison and Selection Instructions: setp
9.7.5.3. Comparison and Selection Instructions: selp
9.7.5.4. Comparison and Selection Instructions: slct
▷9.7.6. Half Precision Comparison Instructions
9.7.6.1. Half Precision Comparison Instructions: set
9.7.6.2. Half Precision Comparison Instructions: setp
▷9.7.7. Logic and Shift Instructions
9.7.7.1. Logic and Shift Instructions: and
9.7.7.2. Logic and Shift Instructions: or
9.7.7.3. Logic and Shift Instructions: xor
9.7.7.4. Logic and Shift Instructions: not
9.7.7.5. Logic and Shift Instructions: cnot
9.7.7.6. Logic and Shift Instructions: lop3
9.7.7.7. Logic and Shift Instructions: shf
9.7.7.8. Logic and Shift Instructions: shl
9.7.7.9. Logic and Shift Instructions: shr
▷9.7.8. Data Movement and Conversion Instructions
9.7.8.1. Cache Operators
9.7.8.2. Data Movement and Conversion Instructions: mov
9.7.8.3. Data Movement and Conversion Instructions: mov
9.7.8.4. Data Movement and Conversion Instructions: shfl
9.7.8.5. Data Movement and Conversion Instructions: shfl.sync
9.7.8.6. Data Movement and Conversion Instructions: prmt
9.7.8.7. Data Movement and Conversion Instructions: ld
9.7.8.8. Data Movement and Conversion Instructions: ld.global.nc
9.7.8.9. Data Movement and Conversion Instructions: ldu
9.7.8.10. Data Movement and Conversion Instructions: st
9.7.8.11. Data Movement and Conversion Instructions: prefetch, prefetchu
9.7.8.12. Data Movement and Conversion Instructions: isspacep
9.7.8.13. Data Movement and Conversion Instructions: cvta
9.7.8.14. Data Movement and Conversion Instructions: cvt
▷9.7.9. Texture Instructions
9.7.9.1. Texturing Modes
9.7.9.2. Mipmaps
9.7.9.3. Texture Instructions: tex
9.7.9.4. Texture Instructions: tld4
9.7.9.5. Texture Instructions: txq
9.7.9.6. Texture Instructions: istypep
▷9.7.10. Surface Instructions
9.7.10.1. Surface Instructions: suld
9.7.10.2. Surface Instructions: sust
9.7.10.3. Surface Instructions: sured
9.7.10.4. Surface Instructions: suq
▷9.7.11. Control Flow Instructions
9.7.11.1. Control Flow Instructions: {}
9.7.11.2. Control Flow Instructions: @
9.7.11.3. Control Flow Instructions: bra
9.7.11.4. Control Flow Instructions: brx.idx
9.7.11.5. Control Flow Instructions: call
9.7.11.6. Control Flow Instructions: ret
9.7.11.7. Control Flow Instructions: exit
▷9.7.12. Parallel Synchronization and Communication Instructions
9.7.12.1. Parallel Synchronization and Communication Instructions: bar, barrier
9.7.12.2. Parallel Synchronization and Communication Instructions: bar.warp.sync
9.7.12.3. Parallel Synchronization and Communication Instructions: membar/fence
9.7.12.4. Parallel Synchronization and Communication Instructions: atom
9.7.12.5. Parallel Synchronization and Communication Instructions: red
9.7.12.6. Parallel Synchronization and Communication Instructions: vote
9.7.12.7. Parallel Synchronization and Communication Instructions: vote.sync
9.7.12.8. Parallel Synchronization and Communication Instructions: match.sync
▷9.7.13. Warp Level Matrix Multiply-Accumulate Instructions
9.7.13.1. Matrix Shape
9.7.13.2. Matrix Fragments
9.7.13.3. Matrix Storage
9.7.13.4. Warp-level Matrix Load Instruction: wmma.load
9.7.13.5. Warp-level Matrix Load Instruction: wmma.store
9.7.13.6. Warp-level Matrix Multiply-and-Accumulate Instruction: wmma.mma
9.7.14. Video Instructions
▷9.7.15. Scalar Video Instructions
9.7.15.1. Scalar Video Instructions: vadd, vsub, vabsdiff, vmin, vmax
9.7.15.2. Scalar Video Instructions: vshl, vshr
9.7.15.3. Scalar Video Instructions: vmad
9.7.15.4. Scalar Video Instructions: vset
▷9.7.16. SIMD Video Instructions
9.7.16.1. SIMD Video Instructions: vadd2, vsub2, vavrg2, vabsdiff2, vmin2, vmax2
9.7.16.2. SIMD Video Instructions: vset2
9.7.16.3. SIMD Video Instructions: vadd4, vsub4, vavrg4, vabsdiff4, vmin4, vmax4
9.7.16.4. SIMD Video Instructions: vset4
▷9.7.17. Miscellaneous Instructions
9.7.17.1. Miscellaneous Instructions: trap
9.7.17.2. Miscellaneous Instructions: brkpt
9.7.17.3. Miscellaneous Instructions: pmevent
▷10. Special Registers
10.1. Special Registers: %tid
10.2. Special Registers: %ntid
10.3. Special Registers: %laneid
10.4. Special Registers: %warpid
10.5. Special Registers: %nwarpid
10.6. Special Registers: %ctaid
10.7. Special Registers: %nctaid
10.8. Special Registers: %smid
10.9. Special Registers: %nsmid
10.10. Special Registers: %gridid
10.11. Special Registers: %lanemask_eq
10.12. Special Registers: %lanemask_le
10.13. Special Registers: %lanemask_lt
10.14. Special Registers: %lanemask_ge
10.15. Special Registers: %lanemask_gt
10.16. Special Registers: %clock, %clock_hi
10.17. Special Registers: %clock64
10.18. Special Registers: %pm0..%pm7
10.19. Special Registers: %pm0_64..%pm7_64
10.20. Special Registers: %envreg<32>
10.21. Special Registers: %globaltimer, %globaltimer_lo, %globaltimer_hi
10.22. Special Registers: %total_smem_size
10.23. Special Registers: %dynamic_smem_size
▷11. Directives
▷11.1. PTX Module Directives
11.1.1. PTX Module Directives: .version
11.1.2. PTX Module Directives: .target
11.1.3. PTX Module Directives: .address_size
▷11.2. Specifying Kernel Entry Points and Functions
11.2.1. Kernel and Function Directives: .entry
11.2.2. Kernel and Function Directives: .func
▷11.3. Control Flow Directives
11.3.1. Control Flow Directives: .branchtargets
11.3.2. Control Flow Directives: .calltargets
11.3.3. Control Flow Directives: .callprototype
▷11.4. Performance-Tuning Directives
11.4.1. Performance-Tuning Directives: .maxnreg
11.4.2. Performance-Tuning Directives: .maxntid
11.4.3. Performance-Tuning Directives: .reqntid
11.4.4. Performance-Tuning Directives: .minnctapersm
11.4.5. Performance-Tuning Directives: .maxnctapersm (deprecated)
11.4.6. Performance-Tuning Directives: .pragma
▷11.5. Debugging Directives
11.5.1. Debugging Directives: @@dwarf
11.5.2. Debugging Directives: .section
11.5.3. Debugging Directives: .file
11.5.4. Debugging Directives: .loc
▷11.6. Linking Directives
11.6.1. Linking Directives: .extern
11.6.2. Linking Directives: .visible
11.6.3. Linking Directives: .weak
11.6.4. Linking Directives: .common
▷12. Release Notes
12.1. Changes in PTX ISA Version 6.1
12.2. Changes in PTX ISA Version 6.0
12.3. Changes in PTX ISA Version 5.0
12.4. Changes in PTX ISA Version 4.3
12.5. Changes in PTX ISA Version 4.2
12.6. Changes in PTX ISA Version 4.1
12.7. Changes in PTX ISA Version 4.0
12.8. Changes in PTX ISA Version 3.2
12.9. Changes in PTX ISA Version 3.1
12.10. Changes in PTX ISA Version 3.0
12.11. Changes in PTX ISA Version 2.3
12.12. Changes in PTX ISA Version 2.2
12.13. Changes in PTX ISA Version 2.1
12.14. Changes in PTX ISA Version 2.0
▷A. Descriptions of .pragma Strings
A.1. Pragma Strings: “nounroll”

文章版权归 FindHao 所有丨本站默认采用CC-BY-NC-SA 4.0协议进行授权|
转载必须包含本声明,并以超链接形式注明作者 FindHao 和本文原始地址:
http://www.findhao.net/easycoding/2373

你可能喜欢:

Find

新浪微博(FindHaoX86)QQ群:不安分的Coder(375670127)不安分的Coder 微信公众号(findhao-net)

发表评论

电子邮件地址不会被公开。 必填项已用*标注