Answer / Explanation:
From the provided Hint in Figure 3.50 and also The baseline performance ( in cycles, per loop iteration) of the code sequence in Figure 2.35 shows that if no new instruction’s execution could be initiated until the previous instruction’s execution had completed, is 40.
Now, you might ask that how did I come up with that number?
Each instruction requires one clock cycle of execution (a clock cycle in which that instruction, and only that instruction, is occupying the execution units; since every instruction must execute, the loop will take at least that many clock cycles). To that base number, we add the extra latency cycles. Don’t forget the branch shadow cycle.
Therefore, the resulting code would be:
Loop: LD F2,0(Rx) 1 + 4
DIVD F8,F2,F01 + 12
MULTD F2,F6,F2 1 + 5
LDF4,0(Ry) 1 + 4
ADDD F4,F0,F4 1 + 1
ADDD F10,F8,F2 1 + 1
ADDI Rx,Rx,#8 1
ADDIRy,Ry,#81
SDF4,0(Ry) 1 + 1
SUB R20,R4,Rx 1
BNZ R20,Loop 1 + 1
Cycle per loop iter 40