With software prefetching, it is important to be careful to have the prefetches occur in time for use but also to minimize the number of outstanding prefetches to live within the capabilities of the microarchitecture and minimize cache pollution. This is complicated by the fact that different processors have different capabilities and limitations. a. [15] <2.3> Create a blocked version of the matrix transpose with software prefetching. b. [20] <2.3> Estimate and compare the performance of the blocked and unblocked transpose codes both with and without software prefetch