Answer:
the first T time steps are a factor in the performance measure. So for instance, if the environment is in state A at time step 1, the performance measure can be different than being in state A at step 2 since the state of the environment in step 1 is relevant to the performance measure in the latter case. Thus as the performance measures can be different, the rational agent may make different actions.