Firstly introduce some basic knowledge about GPU to better illustrate the concept of multithreading.

GPU

GPUs use a SIMT(Single Instruction, Multiple Thread) model, where individual scalar instruction streams for each CUDA thread are grouped together for SIMD execution.

GPU is more lk

Multithreading: A Short Review

Motivation of Multithreading

Difficult to continue to extract instruction-level parallelism or data level parallelism from a single sequential thread of control.
Many workloads can make use of thread-level parallelism.
Multithreading use TLP to improve utilization of a single processor.

Coarse-Grain Multithreading

There is a question: how can we guarantee no dependencies between instructions in a pipeline?

One possible solution: to interleave execution of instructions from different program threads on same pipeline.

A simple multithreaded pipeline

However, multithreading has its cost: 1. Each thread requires its own user state: PC, GPRs and so on. 2. It also needs to record its own systems state: page table, base register, exception handling register. 3. Other overheads: + Additional cache / TLB conflicts from competing threading. + Need larger cahce/TLB capacity. + More OS overhead to schedule more threads. ### Thread Scheduling Policies

Simultaneous Multithreading

Simultaneous Multithreading(SMT) use fine-grain control already presented inside an OOO superscalar to allow instructions from multiple thread to enter execution on same lock cycle, which contributes to better utilization of machine resources.

GPU

Multithreading: A Short Review

Motivation of Multithreading

Coarse-Grain Multithreading

Simultaneous Multithreading

Summary of Multithread