Tuesday, May 2, 2017

The Memory Mountain L2 Bump

In a previous post, I showed how the different editions of CS:APP have used the memory mountain as a way to visualize memory system performance.  It demonstrates how the cache hierarchy affects performance, which in turn can be used by application programmers to write more efficient programs.

Here's the memory mountain for a recent Intel processor:

 As the label indicates, there is a curious bump in the curve corresponding to cases where data are held in the L2 cache, and where the throughput as a function of the strides does not go down smoothly.  Looking back at the other mountains, you'll see this phenomenon for other cases as well, specifically Intel processors that employ prefetching.

Let's investigate this phenomenon more closely.  Quite honestly, though I have no explanation for why it occurs.

In the above processor, a cache block contains eight 8-byte long integers.  For a stride of S, considering only spatial locality, we would expect a miss rate of around S/8 for strides up to 8.  For example, a stride of 1 would yield one miss followed by 7 hits, while a stride of 2 would yield one miss followed by 3 hits.  For strides of 8 or more, the miss rate would be 100%.  If read incurring a cache miss incurs a delay M, and one incurring a hit incurs a delay H, then we would expect the average time per access to be M*S/8 + H*(1-S/8).  The throughput should be the reciprocal of the average delay.

For the larger sizes, where data resides in the L3 cache, this predictive model holds fairly well.  Here are the data for a size of 4 megabytes:

In this graph, the values of M and H were determined by matching the data for S=1 and S=8.  So, it's no surprise that these two cases match exactly.  But, the model also works fairly well for other values of S.  

For sizes that fit in the L2 cache, however, the predictive model is clearly off:

We can see the bump.  The data tracks well for S=2, but for S=3, the actual memory system outperforms the predictive model.  The effect drops off for larger values of S.

I have experimented with the measurement code to see if the bump is some artifact of how we run the tests, but I believe this is not the case.  I believe there is some feature of the memory system that causes this phenomenon.

I would welcome any ideas on what might cause memory mountains to have this bump.

No comments:

Post a Comment