Here's the mountain from the First Edition, based on a 1999 Intel Pentium III Xeon:
The memory mountain shows the throughput achieved by a program repeatedly reading elements from an array of N elements, using a stride of S (i.e., accessing elements 0, S, 2S, 3S, ..., N-1). The performance, measured in megabytes (MB) per second, varies according to how many of the elements are found in one of the processor's caches. For small values of N, the elements can be held in the L1 cache, achieving maximum read throughput. For larger values of N, the elements can be held in the L2 cache, and the L1 cache may be helpful for exploiting spatial locality for smaller values of S. For large values of N, the elements will reside in main memory, but both the L1 and L2 cache can improve performance when S enables some degree of spatial locality.
By way of reference, the use of the memory mountain for visualizing memory performance was devised by Thomas Stricker while he was a PhD student at CMU in the 1990s working for Prof. Thomas Gross. Both of them now live in Switzerland, with Thomas Gross on the faculty at ETH.
For the third edition of CS:APP, we used a 2013 Intel Core i5, using the Haswell microarchitecture. The above figure shows measurements for this machine using the improved timing code. Overall, though, the memory system is similar to the Nehalem processor from CS:APP2e. It has 3 levels of cache and uses prefetching. Note how high the overall throughputs are.
Over the nearly 20-year time span represented by these machines, we can see that memory systems have undergone evolutionary changes. More levels of cache have been added, and caches have become larger. Throughputs have improved by over an order of magnitude. Prefetching helps when access patterns are predictable. It's interesting to see how the visualization provided by memory mountains enables us to see these qualitative and quantitative changes.