Firstly introduce some basic knowledge about GPU to better illustrate the concept of multithreading.
GPU
GPUs use a SIMT(Single Instruction, Multiple Thread) model, where individual scalar instruction streams for each CUDA thread are grouped together for SIMD execution.
GPU is more lk
Multithreading: A Short Review
Motivation of Multithreading
- Difficult to continue to extract instruction-level parallelism or data level parallelism from a single sequential thread of control.
- Many workloads can make use of thread-level parallelism.
- Multithreading use TLP to improve utilization of a single processor.
Coarse-Grain Multithreading
There is a question: how can we guarantee no dependencies between instructions in a pipeline?
One possible solution: to interleave execution of instructions from different program threads on same pipeline.
A simple multithreaded pipeline
However, multithreading has its cost: 1. Each thread requires its own user state: PC, GPRs and so on. 2. It also needs to record its own systems state: page table, base register, exception handling register. 3. Other overheads: + Additional cache / TLB conflicts from competing threading. + Need larger cahce/TLB capacity. + More OS overhead to schedule more threads. ### Thread Scheduling Policies
Simultaneous Multithreading
Simultaneous Multithreading(SMT) use fine-grain control already presented inside an OOO superscalar to allow instructions from multiple thread to enter execution on same lock cycle, which contributes to better utilization of machine resources.