Loop: LD F0, 0(R1) ;F0 - array element
ADDD F4, F0, F2 ;add scalar in F2
SD 0(R1), F4 ;store result
SUBI R1, R1, #8 ;decrement pointer
;8 bytes (per double)
BENZ R1, Loop ;branch R1 != zero
on DLX this looks:
Cycles
Loop: LD F0, 0(R1) 1
stall 2
ADDD F4, F0,F2 3
stall 4
stall 5
SD 0(R1), F4 6
SUBI R1, R1,#8 7
BENZ R1, Loop 8
stall 9
i want to understand when does the stall occur?
thanks