Introduction to VLIW
VLIW processors issue a fixed number of instructions formatted either as one large instruction or as a fixed instruction packet with the parallelism among instructions explicitly indicated by the instruction. 1. Multple operations are packed into one instruction. 2. Each operation slot is for a fixed function. 3. Constant operation latencies are specified. 4. Architecture required guarantee of + Parallelsim within an instruction + No data use before data ready.
VLIW Equals(EQ) Scheduling Model
- Each operation takes exactly specified latency.
- Efficient register usage
- No need for register renaming or buffering.
- Compiler depends on not having registers visible early.
VLIW Compiler Optimizations
The responsibilities of VLIW compiler 1. Schedule operations to maximize parallel execution. 2. Guarantee intro-instruction parallelism. 3. Schedule to avoid data hazards.
Loop Unrolling
Give its assembly language:
What if There are no loops?
Firstly, define a concept named Basic block: Basic block: single entry and single exit.
- Branch limit basic block size in control-flow intensive irregular code.
- It's difficult to find ILP in individual basic blocks.
Classic VLIW Challenges
- Object-code compatbility: have to recompile all code for every machine even for two machine in same generation.
- Object code size: intruction padding wastes instruction memory and loop unrolling/software pipelining replicates code.
- Scheduling variable latency memory operations: caches and memory bank conflicts impose statically unpredicatable variability.
- Knowing branch probabilities.
- Scheduling for statically unpredictable branches.
- Precise interrupts can be challenging.
How to SOLVE these challenges? 1. VLIW instruction encoding: