tag:blogger.com,1999:blog-6341131047156555522024-03-12T19:13:05.005-04:00CS:APPAll topics concerning the contents and the use of the textbook:
<br>
R.E. Bryant & D.R. O'Hallaron, <a href="http://csapp.cs.cmu.edu">Computer Systems: A Programmer's Perspective</a>Randy Bryanthttp://www.blogger.com/profile/04800922504839845345noreply@blogger.comBlogger36125tag:blogger.com,1999:blog-634113104715655552.post-90732290652755715182018-10-08T09:58:00.001-04:002018-10-08T09:58:23.600-04:00Formal Verification of Y86-64 ProcessorsWe recently used the UCLID5 verifier, developed at U.C., Berkeley, to formally verify the Y86-64 processors described in Chapter 4 of CS:APP3e. You can get more information about UCLID5, as well as the verifier itself at the <a href="https://github.com/uclid-org/uclid">project Github site</a><br />
<br />
Back in 2005, we did a <a href="http://www.cs.cmu.edu/~bryant/pubdir/CMU-CS-05-195.pdf">similar verification</a> of the Y86 processors appearing in the first edition of CS:APP. Our recent work provides an update of both the processor designs and the verification tool.<br />
<br />
What did we actually do? The diagram below shows the process by which we constructed a verification model:<br />
<br />
<div class="separator" style="clear: both; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhQQaX_byTeGy7Y33X-YPRajf93CL87F7T3QsmcRTjYemU4_Qq5YXiKanU4sOXQgYBoDAEjFW_jKL_v-I2I4A96oK4wPm3jY6gBhAwC1eFf_aA6xusHcOtcFWvAXo_ukeBBJxrgOyjdvXms/s1600/generation.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" data-original-height="1048" data-original-width="1600" height="209" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhQQaX_byTeGy7Y33X-YPRajf93CL87F7T3QsmcRTjYemU4_Qq5YXiKanU4sOXQgYBoDAEjFW_jKL_v-I2I4A96oK4wPm3jY6gBhAwC1eFf_aA6xusHcOtcFWvAXo_ukeBBJxrgOyjdvXms/s320/generation.png" width="320" /></a></div>
<br />
This model allows us to compare the sequential implementation SEQ to a pipelined implementation PIPE. That is, we can show that for any possible program and any starting state, the two implementations will yield the same results, in terms of their updates to the register file, the condition codes, the program counter, and the memory. In other words, all of the mechanisms in PIPE to deal with data and control hazards—data forwarding, stalling, and exception handling—operate correctly.<br />
<br />
As the figure illustrates, the HCL control logic descriptions were translated directly into UCLID5 code using a program hcl2U. We have now developed HCL translators to map the control logic into 1) C code for use in a simulator, 2) Verilog code for use by a logic synthesis program, and 3) both the original and most recent versions of UCLID.<br />
<br />
UCLID5 makes use of the Z3 SMT solver from Microsoft Research to search for inconsistencies between the two processors. Z3 is a <i>complete</i> solver, and so it can determine that no such inconsistency exists, and therefore the design is correct.<br />
<br />
We considered seven different variants of the pipeline: the one described in the book, as well as ones that implement various extensions and variations given as homework assignments. All seven were proved correct.<br />
<br />
There is really no surprise here, but it's nice to be sure that there's not some subtle bug lingering in our designs.<br />
<br />
A complete report on this effort is available as CMU Technical Report <a href="http://www.cs.cmu.edu/~bryant/pubdir/CMU-CS-18-122.pdf">CMU-CS-18-122</a>.Randy Bryanthttp://www.blogger.com/profile/04800922504839845345noreply@blogger.com0tag:blogger.com,1999:blog-634113104715655552.post-65695041079530703092018-06-06T22:53:00.001-04:002018-06-06T22:53:36.482-04:00Electronic Versions of CS:APPPearson now distributes electronic editions of CS:APP through VitalSource. See<br />
<br />
<a href="https://www.vitalsource.com/products/computer-systems-a-programmer-39-s-perspective-randal-e-bryant-david-r-v9780134092997">https://www.vitalsource.com/products/computer-systems-a-programmer-39-s-perspective-randal-e-bryant-david-r-v9780134092997</a><br />
<br />
Access for 180 days (enough for taking a course) is available for $33.99. Perpetual access costs $99.99.<br />
<br />
Our understanding is that access is provided via an Internet-based portal, rather than as a standalone electronic document.<br />
<br />
<br />
<br />Randy Bryanthttp://www.blogger.com/profile/04800922504839845345noreply@blogger.com0tag:blogger.com,1999:blog-634113104715655552.post-16419309990389485122017-07-05T18:03:00.001-04:002017-07-05T18:03:15.689-04:00English Edition of CS:APP3e available in ChinaAn English-language version of CS:APP3e has now been made available from <a href="http://item.jd.com/12155718.html">China Machine Press</a>.<br />
<br />
<div class="separator" style="clear: both; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEir525lx3YTDjySVmd1fSpRNtjnzo4CuAUC-EdH-MOx-rxP6RiBLW0gvHqL7uJVZrItd6mRYdzGD4hJQzncfRydf9sqPO53vqDe4Wuz3Lpf3mB8Z2cWkxwL5-Fx3iagy_y4UushkZLl1bLb/s1600/csapp-chieng.jpg" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" data-original-height="397" data-original-width="330" height="320" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEir525lx3YTDjySVmd1fSpRNtjnzo4CuAUC-EdH-MOx-rxP6RiBLW0gvHqL7uJVZrItd6mRYdzGD4hJQzncfRydf9sqPO53vqDe4Wuz3Lpf3mB8Z2cWkxwL5-Fx3iagy_y4UushkZLl1bLb/s320/csapp-chieng.jpg" width="265" /></a></div>
Such versions from previous editions of the book have been very popular.<br />
<br />
<br />
<br />Randy Bryanthttp://www.blogger.com/profile/04800922504839845345noreply@blogger.com1tag:blogger.com,1999:blog-634113104715655552.post-24437616924311043672017-05-05T19:08:00.000-04:002017-05-05T19:15:21.086-04:00New CS:APP song!Prof. Casey Deccio at BYU wrote this <a href="http://casey.byu.edu/media/hello_summer_break.mp3">catchy ditty</a> about CS:APP called <i>Hello Summer Break</i>. It's pretty awesome!Dave O'Hallaronhttp://www.blogger.com/profile/06635391504657170721noreply@blogger.com0tag:blogger.com,1999:blog-634113104715655552.post-81196808089832119292017-05-02T11:57:00.001-04:002017-05-02T11:57:06.166-04:00The Memory Mountain L2 BumpIn a <a href="http://csappbook.blogspot.com/2017/05/a-gallery-of-memory-mountains.html">previous post</a>, I showed how the different editions of CS:APP have used the memory mountain as a way to visualize memory system performance. It demonstrates how the cache hierarchy affects performance, which in turn can be used by application programmers to write more efficient programs.<br />
<br />
Here's the memory mountain for a recent Intel processor:<br />
<br />
<div class="separator" style="clear: both; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEiCJfScKc38ZoSYSlsQSHqaxdBaYv3UaRZ6eUijAu8FhGv5nmwq6uJGgS6ASqbVkqDjD8cqgVUDQE4YiCMK-Aw2KIGbbSSTN6YsL-_GoQsgjGPxfqYuXqv4VyGAbWf7-4LV_7sWD0Sa2apT/s1600/Slide7.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" height="300" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEiCJfScKc38ZoSYSlsQSHqaxdBaYv3UaRZ6eUijAu8FhGv5nmwq6uJGgS6ASqbVkqDjD8cqgVUDQE4YiCMK-Aw2KIGbbSSTN6YsL-_GoQsgjGPxfqYuXqv4VyGAbWf7-4LV_7sWD0Sa2apT/s400/Slide7.png" width="400" /></a></div>
As the label indicates, there is a curious bump in the curve corresponding to cases where data are held in the L2 cache, and where the throughput as a function of the strides does not go down smoothly. Looking back at the <a href="http://csappbook.blogspot.com/2017/05/a-gallery-of-memory-mountains.html">other mountains,</a> you'll see this phenomenon for other cases as well, specifically Intel processors that employ prefetching.<br />
<br />
Let's investigate this phenomenon more closely. Quite honestly, though I have no explanation for why it occurs.<br />
<br />
In the above processor, a cache block contains eight 8-byte long integers. For a stride of S, considering only spatial locality, we would expect a miss rate of around S/8 for strides up to 8. For example, a stride of 1 would yield one miss followed by 7 hits, while a stride of 2 would yield one miss followed by 3 hits. For strides of 8 or more, the miss rate would be 100%. If read incurring a cache miss incurs a delay M, and one incurring a hit incurs a delay H, then we would expect the average time per access to be M*S/8 + H*(1-S/8). The throughput should be the reciprocal of the average delay.<br />
<br />
For the larger sizes, where data resides in the L3 cache, this predictive model holds fairly well. Here are the data for a size of 4 megabytes:<br />
<br />
<div class="separator" style="clear: both; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhQmGP9otLCpnPRtJ1duc7DU4uUCOmzxNKqQmuCFEqKg0ZHL44-TL63mLN2FvR_ZJcXRkMPDnNPGoJTqMoZv5W6cRAkvw8Q94lv-ZQFFrICQ-sLx-jI6zNomUJV2Jy3_OyfqdJkR7oHoxvS/s1600/Slide8.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" height="300" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhQmGP9otLCpnPRtJ1duc7DU4uUCOmzxNKqQmuCFEqKg0ZHL44-TL63mLN2FvR_ZJcXRkMPDnNPGoJTqMoZv5W6cRAkvw8Q94lv-ZQFFrICQ-sLx-jI6zNomUJV2Jy3_OyfqdJkR7oHoxvS/s400/Slide8.png" width="400" /> </a></div>
<div class="separator" style="clear: both; text-align: left;">
In this graph, the values of M and H were determined by matching the data for S=1 and S=8. So, it's no surprise that these two cases match exactly. But, the model also works fairly well for other values of S. </div>
<div class="separator" style="clear: both; text-align: left;">
<br /></div>
<div class="separator" style="clear: both; text-align: left;">
For sizes that fit in the L2 cache, however, the predictive model is clearly off:</div>
<div class="separator" style="clear: both; text-align: left;">
<br /></div>
<div class="separator" style="clear: both; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgmVCdSTgnrvUnPhEukAll_KayQwqpbxpHSbhu5fK9OA-WDX3nh51HcOwns4E3t4jxH8xuGSWDo-7c9x1sFyn6AwrmlW-eabsun-_Mdhy8ogchfTsr1xwsbG6fa8pRkvXwraBccanN5LZT6/s1600/Slide9.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" height="300" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgmVCdSTgnrvUnPhEukAll_KayQwqpbxpHSbhu5fK9OA-WDX3nh51HcOwns4E3t4jxH8xuGSWDo-7c9x1sFyn6AwrmlW-eabsun-_Mdhy8ogchfTsr1xwsbG6fa8pRkvXwraBccanN5LZT6/s400/Slide9.png" width="400" /></a></div>
We can see the bump. The data tracks well for S=2, but for S=3, the actual memory system outperforms the predictive model. The effect drops off for larger values of S.<br />
<br />
I have experimented with the measurement code to see if the bump is some artifact of how we run the tests, but I believe this is not the case. I believe there is some feature of the memory system that causes this phenomenon.<br />
<br />
I would welcome any ideas on what might cause memory mountains to have this bump. <br />
<br />
<br />
<br />Randy Bryanthttp://www.blogger.com/profile/04800922504839845345noreply@blogger.com0tag:blogger.com,1999:blog-634113104715655552.post-56428357612820609172017-05-02T10:56:00.001-04:002017-05-02T11:25:30.130-04:00A Gallery of Memory MountainsThrough all 3 editions, CS:APP has used <i>memory mountains</i> to illustrate how the cache hierarchy affects memory system performance. Here we compare the memory mountains of different processors over the years, revealing evolutionary changes in memory system design.<br />
<br />
Here's the mountain from the First Edition, based on a 1999 Intel Pentium III Xeon:<br />
<br />
<div class="separator" style="clear: both; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhnFPxC8tjAnfm9s72NhmS6xXTzYLbFPJ3znQ9aX-hYeXJW1k8x1d73JXoR_hWUiUnsnfn7YxQo2dmzhAB7srg8irSSPfrdV1b1dZb3iIrfhWYJqxJ24Xwjykp3XN1FqaIcRvnHpUcCdUPq/s1600/Slide1.png" style="margin-left: 1em; margin-right: 1em;"><img border="0" height="300" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhnFPxC8tjAnfm9s72NhmS6xXTzYLbFPJ3znQ9aX-hYeXJW1k8x1d73JXoR_hWUiUnsnfn7YxQo2dmzhAB7srg8irSSPfrdV1b1dZb3iIrfhWYJqxJ24Xwjykp3XN1FqaIcRvnHpUcCdUPq/s400/Slide1.png" width="400" /></a></div>
<br />
<br />
The memory mountain shows the throughput achieved by a program repeatedly reading elements from an array of N elements, using a stride of S (i.e., accessing elements 0, S, 2S, 3S, ..., N-1). The performance, measured in megabytes (MB) per second, varies according to how many of the elements are found in one of the processor's caches. For small values of N, the elements can be held in the L1 cache, achieving maximum read throughput. For larger values of N, the elements can be held in the L2 cache, and the L1 cache may be helpful for exploiting spatial locality for smaller values of S. For large values of N, the elements will reside in main memory, but both the L1 and L2 cache can improve performance when S enables some degree of spatial locality.<br />
<br />
By way of reference, the use of the memory mountain for visualizing memory performance was devised by Thomas Stricker while he was a PhD student at CMU in the 1990s working for Prof. <a href="http://www.lst.inf.ethz.ch/people/trg.html">Thomas Gross</a>. Both of them now live in Switzerland, with Thomas Gross on the faculty at ETH.<br />
<br />
<div class="separator" style="clear: both; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEj8mp3Ce2FVIiCUb2voX3iFZjdBG4fHCS8vUCxt95x6BMAbf_mc3EqtrFQWvFOA-PRIC19Kn0XVC0cu2Yza0xFE6IGAPCDGcT-Rk-6SJThnkHyWpgXnzmhyRUAIzFHSQ_Jj9IjSO4PiPPbd/s1600/Slide2.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" height="300" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEj8mp3Ce2FVIiCUb2voX3iFZjdBG4fHCS8vUCxt95x6BMAbf_mc3EqtrFQWvFOA-PRIC19Kn0XVC0cu2Yza0xFE6IGAPCDGcT-Rk-6SJThnkHyWpgXnzmhyRUAIzFHSQ_Jj9IjSO4PiPPbd/s400/Slide2.png" width="400" /></a></div>
Jumping forward 9 years, the above figure illustrates the performance for a 2009 iMac that I still use as a home computer. It has the same qualitative characteristics of the Pentium III, with two levels of cache. Overall, it has over 10X higher throughput. You'll also notice how smooth the curves are. That's partly because I rewrote the code last year to give more reliable timing measurements.<br />
<br />
<div class="separator" style="clear: both; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgsZ7rkctFizHQ08Hy6bLce42IrF5mcnPqSDBDuwHaU_HYCALmjlgXHVln6iYYRIoCjzzkVwju_foUqe0PwRA2GVkIJAUOGj03FG_dPLgBz8VcMELySgizINj2FqnyZg5eD2GMz2c5eQW72/s1600/Slide3.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" height="300" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgsZ7rkctFizHQ08Hy6bLce42IrF5mcnPqSDBDuwHaU_HYCALmjlgXHVln6iYYRIoCjzzkVwju_foUqe0PwRA2GVkIJAUOGj03FG_dPLgBz8VcMELySgizINj2FqnyZg5eD2GMz2c5eQW72/s400/Slide3.png" width="400" /></a></div>
For the second edition of CS:APP, we used a 2009 Intel Core i7, using the Nehalem microarchitecture. The data here are noisy—they predate my improved timing code. There are several important features in this mountain not found in the earlier ones. First, there are 3 levels of cache. There is also a striking change along the back edge. The performance for a stride of 1 stays high for sizes up to around 16MB, where the data can still be held in the L2 cache. It reflects the ability of the memory system to initiate <i>prefetching</i>, where it observes the memory access pattern and predicts which memory blocks will be read in the future. It copies these blocks from L3 up to L2 and L1. Then when the processor reads these memory locations, the data will already be in the L1 cache. The overall performance is well short of that measured for the contemporary (2008) Core Duo shown above. This could partly be due to differences in the timing code–the newer version uses 4-way parallelism when reading the data, whereas the old code was purely sequential.<br />
<br />
<div class="separator" style="clear: both; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgHHpQ8HMb4e36dNc2y9a9AyA1LnoG6LOFVhlSutmtZuRHohBz-OJc8P2VAOs2UCavuiMEEUTR6bnm1a0EBkRka1TlSdNkpxPLazZalo_JATur8pOoo5apdf-4ZziTYXXhSQg2ds6bB-6ou/s1600/Slide4.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" height="300" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgHHpQ8HMb4e36dNc2y9a9AyA1LnoG6LOFVhlSutmtZuRHohBz-OJc8P2VAOs2UCavuiMEEUTR6bnm1a0EBkRka1TlSdNkpxPLazZalo_JATur8pOoo5apdf-4ZziTYXXhSQg2ds6bB-6ou/s400/Slide4.png" width="400" /></a></div>
<br />
For the third edition of CS:APP, we used a 2013 Intel Core i5, using the Haswell microarchitecture. The above figure shows measurements for this machine using the improved timing code. Overall, though, the memory system is similar to the Nehalem processor from CS:APP2e. It has 3 levels of cache and uses prefetching. Note how high the overall throughputs are.<br />
<br />
<div class="separator" style="clear: both; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEiJQJzQ-ouxudtygOyBYqD56NjvQA4LjTbJ6iDSTBEC3wuUkpmPClKU1L2klMzjBYRjbxVQ7GqMNpeY0i4_m7cf3VnBQhnPSaYwklJ6FaGAvJRBMC-BO6cA7zkSEjFI0p0Y9xLJZz5pIrP4/s1600/Slide5.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" height="300" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEiJQJzQ-ouxudtygOyBYqD56NjvQA4LjTbJ6iDSTBEC3wuUkpmPClKU1L2klMzjBYRjbxVQ7GqMNpeY0i4_m7cf3VnBQhnPSaYwklJ6FaGAvJRBMC-BO6cA7zkSEjFI0p0Y9xLJZz5pIrP4/s400/Slide5.png" width="400" /></a></div>
The final mountain uses measurements from a Raspberry Pi 3. The Pi uses a processor included as part of a Broadcomm "system on a chip" designed for use in smartphones. The processor is based on the ARM Cortex-A53 microarchitecture. Performance-wise, it is much slower than a contemporary desktop processor, but it is very respectable for an embedded processor. There's a clear 2-level cache structure. It also appears that some amount of prefetching might occur with both cache and main memory accesses.<br />
<br />
Over the nearly 20-year time span represented by these machines, we can see that memory systems have undergone evolutionary changes. More levels of cache have been added, and caches have become larger. Throughputs have improved by over an order of magnitude. Prefetching helps when access patterns are predictable. It's interesting to see how the visualization provided by memory mountains enables us to see these qualitative and quantitative changes.<br />
<div class="separator" style="clear: both; text-align: center;">
</div>
<br />
<div class="separator" style="clear: both; text-align: center;">
</div>
<br />
<div class="separator" style="clear: both; text-align: center;">
</div>
<br />
<div class="separator" style="clear: both; text-align: center;">
</div>
<br />
<div class="separator" style="clear: both; text-align: center;">
<br /></div>
<br />
<br />
<br />Randy Bryanthttp://www.blogger.com/profile/04800922504839845345noreply@blogger.com2tag:blogger.com,1999:blog-634113104715655552.post-48120090362601479622017-04-04T14:07:00.002-04:002017-04-04T14:07:32.395-04:00Sample Profiling Code AvailableSection 5.14 of CS:APP3e demonstrates how to use the code profiling tool <span style="font-family: "Courier New",Courier,monospace;"><span style="font-size: small;">gprof </span></span>to identify slow portions of a program. It uses as an example a dictionary program that can compute n-gram statistics about a body of text. Here's the measurements from profiling the code when computing the bigram statistics of all of Shakespeare's works:<br />
<br />
<div class="separator" style="clear: both; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEi2rhBc3__4jMrWbkscWc5WKvEF5kTQ2BNF9WgzVlamdNTKpVS2LPhGqfrtb7fzs8aXujULYRTNRB2OEXQ2zTGS0J30oZTy8ZVXuF66QKA65Jjd4WaQ9Np41hJY2h5NsIlKHsfeOh9ISGvf/s1600/prof-fast-s2.jpg" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" height="160" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEi2rhBc3__4jMrWbkscWc5WKvEF5kTQ2BNF9WgzVlamdNTKpVS2LPhGqfrtb7fzs8aXujULYRTNRB2OEXQ2zTGS0J30oZTy8ZVXuF66QKA65Jjd4WaQ9Np41hJY2h5NsIlKHsfeOh9ISGvf/s400/prof-fast-s2.jpg" width="400" /></a></div>
<br />
<br />
We have made all of the code used in this example available on the CSAPP student <a href="http://csapp.cs.cmu.edu/3e/students.html">website</a>, as part of the Chapter 5 materials. The code has been tested on multiple Linux systems. You can see how to code does when running on your machine.<br />
<br />Randy Bryanthttp://www.blogger.com/profile/04800922504839845345noreply@blogger.com0tag:blogger.com,1999:blog-634113104715655552.post-4478089949203927362017-04-04T13:55:00.000-04:002017-04-04T13:55:39.120-04:00Chinese Version of Third Edition AvailableThe Chinese version of CS:APP3e was published by China Machine Press in November, 2016.<br />
<br />
<div class="separator" style="clear: both; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEiplmrNsSuTHUkQpKQyXtcH03SiDh3no3egnnUPz73wB_-K5tsWFT8bYj5LjV7VF1JXFTcPiACATwH1dk8oradazRhEE3AIPtRGkhXeH0Pl0xDDSuRL2ODswG3k4H8wJ58zmqUCtepnllmU/s1600/csapp3e-chinese.jpg" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" height="320" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEiplmrNsSuTHUkQpKQyXtcH03SiDh3no3egnnUPz73wB_-K5tsWFT8bYj5LjV7VF1JXFTcPiACATwH1dk8oradazRhEE3AIPtRGkhXeH0Pl0xDDSuRL2ODswG3k4H8wJ58zmqUCtepnllmU/s320/csapp3e-chinese.jpg" width="220" /></a></div>
<br />
More information is available at the publisher's <a href="http://product.china-pub.com/5007882">website</a>.<br />
The translation was performed by <a href="http://www.yiligong.org/">Yili Gong</a> and Lian He of Wuhan University. We are grateful for the conscientious job Yili has done for all three editions of the book.<br />
<br />
<br />Randy Bryanthttp://www.blogger.com/profile/04800922504839845345noreply@blogger.com0tag:blogger.com,1999:blog-634113104715655552.post-22274200091655278512016-02-18T11:20:00.000-05:002016-02-18T11:20:29.664-05:00Buffer Overflow Vulnerability Discovered in glibcThere's a report out today from Google that their security team discovered a buffer overflow vulnerability in the GNU implementation of <span style="font-family: Courier New, Courier, monospace;">getaddrinfo</span><span style="font-family: inherit;">. Readers of Chapter 11 of CS:APP3e know this function as a very general tool for converting string representations of network parameters into the data structures used by other networking </span>functions<span style="font-family: inherit;">. Engineers at Google and Red Hat were able to demonstrate that the program error could be exploited with a </span>buffer<span style="font-family: inherit;"> overflow attack.</span><br />
<br />
It's instructive to read the bug tracking reports at the Google post on their discovery:<br />
<br />
<a href="https://googleonlinesecurity.blogspot.com/2016/02/cve-2015-7547-glibc-getaddrinfo-stack.html">https://googleonlinesecurity.blogspot.com/2016/02/cve-2015-7547-glibc-getaddrinfo-stack.html</a><br />
<br />
as well as the bug tracking log covering the actual error:<br />
<br />
<a href="https://sourceware.org/bugzilla/show_bug.cgi?id=18665">https://sourceware.org/bugzilla/show_bug.cgi?id=18665</a><br />
<br />
There are several important insights to be gained from this report:<br />
<br />
<br />
<ul>
<li>Buffer overflows are still a key source of software vulnerabilities. Although they can be mitigated by address space randomization and other techniques, they still show up.</li>
<li>This bug was introduced in with glib 2.9 in May, 2008. It was first reported in July, 2015 and fixed in February, 2016. That's a long time for a security vulnerability to lie undetected.</li>
<li>It only happens when a string is given that exceeds the 2048-byte limit of the regular buffer size. The code is then allocates more memory, but it does not correctly update some of the size information properly. Apparently, this part of the code was not tested very carefully. It's an unfortunate reality of program testing that it's hard to reach all of the corner cases in a program. It seems like using code coverage tools could have been beneficial here.</li>
</ul>
<br />
<br />Randy Bryanthttp://www.blogger.com/profile/04800922504839845345noreply@blogger.com0tag:blogger.com,1999:blog-634113104715655552.post-65154457697256941272016-02-09T15:56:00.000-05:002016-02-09T15:56:25.862-05:00Updated the CS:APP Proxy LabWe've updated the <a href="http://csapp.cs.cmu.edu/3e/labs.html">CS:APP Proxy Lab</a> with a new autograder that checks for basic proxy behavior, concurrent execution, and file caching. We've been using this autograder at CMU for several years now and are happy to make it available to the CS:APP community. Dave O'Hallaronhttp://www.blogger.com/profile/06635391504657170721noreply@blogger.com2tag:blogger.com,1999:blog-634113104715655552.post-64568530568290408562016-01-12T17:04:00.001-05:002016-01-12T17:05:49.765-05:00Updated the CS:APP Bomb LabWe've released an update to the <a href="http://csapp.cs.cmu.edu/3e/labs.html">Bomb Lab</a> on the <a href="http://csapp.cs.cmu.edu">CS:APP site</a>. An authentication key associated with each bomb prevents spoofing(from Zaheer Chothia, ETH, Switzerland). And a configurable timeout in the request daemon prevents it from hanging while interacting with clients under heavy loads (from Len Hamy, Macquarie University, Australia).
Dave O'Hallaronhttp://www.blogger.com/profile/06635391504657170721noreply@blogger.com0tag:blogger.com,1999:blog-634113104715655552.post-34654204674366627852016-01-11T20:19:00.003-05:002016-01-12T16:53:35.914-05:00New x86-64 Attack Lab is Available!We are pleased to announce that the new <a href="http://csapp.cs.cmu.edu/3e/labs.html">Attack Lab</a> is available on the <a href="http://csapp.cs.cmu.edu">CS:APP site</a>.
<p>
The Attack Lab was first offered to CMU students in Fall 2015. It is the 64-bit successor to the 32-bit Buffer Lab and was designed for CS:APP3e. In this lab, students are given a pair of unique custom-generated x86-64 binary executables, called <i>targets</i>, that have buffer overflow bugs. One target is vulnerable to code injection attacks. The other is vulnerable to return-oriented programming attacks. Students are asked to modify the behavior of the targets by developing exploits based on either code injection or return-oriented programming.
Dave O'Hallaronhttp://www.blogger.com/profile/06635391504657170721noreply@blogger.com0tag:blogger.com,1999:blog-634113104715655552.post-2878289922460225822015-08-26T11:40:00.000-04:002015-08-26T11:40:06.336-04:00Diane's silk dress costs $89What could a woman's wardrobe have to do with computer systems?<br />
<br />
This is a clever mnemonic devised by Geoff Kuenning of Harvey Mudd College to help him remember which registers are used for passing arguments in a Linux x86-64 system:<br />
<br />
<span style="font-family: Courier New, Courier, monospace;">%rdi</span>:<span style="font-family: Courier New, Courier, monospace;"> </span>Diane's<br />
<span style="font-family: Courier New, Courier, monospace;">%rsi</span>:<span style="font-family: 'Courier New', Courier, monospace;"> </span>Silk<br />
<span style="font-family: Courier New, Courier, monospace;">%rdx</span>:<span style="font-family: 'Courier New', Courier, monospace;"> </span>dress<br />
<span style="font-family: Courier New, Courier, monospace;">%rcx</span>:<span style="font-family: 'Courier New', Courier, monospace;"> </span>costs<br />
<span style="font-family: Courier New, Courier, monospace;">%r8</span>:<span style="font-family: 'Courier New', Courier, monospace;"> </span>$8<br />
<span style="font-family: Courier New, Courier, monospace;">%r9</span>:<span style="font-family: 'Courier New', Courier, monospace;"> </span><span style="font-family: 'Courier New', Courier, monospace;"> </span><span style="font-family: 'Courier New', Courier, monospace;"> </span>9<br />
<br />
Thanks to Geoff for providing this helpful aid!<br />
<br />Randy Bryanthttp://www.blogger.com/profile/04800922504839845345noreply@blogger.com0tag:blogger.com,1999:blog-634113104715655552.post-50608141663151499592015-08-17T18:46:00.001-04:002015-08-17T18:55:17.331-04:00Third Edition: Complete Set of Lecture Notes Now AvailableWe've updated all of the lecture notes for CS:APP3e and posted them on the <a href="http://csapp.cs.cmu.edu/3e/instructors.html">CS:APP3e instructor's site</a>.
Dave O'Hallaronhttp://www.blogger.com/profile/06635391504657170721noreply@blogger.com0tag:blogger.com,1999:blog-634113104715655552.post-44987387618404179552015-06-02T20:17:00.000-04:002015-08-17T18:47:53.241-04:00Third Edition: Ready for Fall CoursesThe Third Edition of <i>Computer Systems: A Programmer's Perspective</i> came out in March. The <a href="http://www.csapp.cs.cmu.edu/">CS:APP web page</a> now contains information for this edition, with a link to the web pages for the second edition. We already have a (fortunately small) <a href="http://csapp.cs.cmu.edu/3e/errata.html">errata page</a>.<br />
<br />
This fall, we will be teaching 15-213, the CMU course that inspired the book originally. Leading up to that, we will update the lecture slides and the labs, and we will be making that available on the <a href="http://csapp.cs.cmu.edu/3e/instructors.html">instructors' site</a>.Randy Bryanthttp://www.blogger.com/profile/04800922504839845345noreply@blogger.com1tag:blogger.com,1999:blog-634113104715655552.post-49983791173090774022015-02-11T18:12:00.000-05:002015-02-11T18:12:25.070-05:00The third edition will be out March 11, 2015We spent many of our 2014 hours writing and revising the book. We feel it will bring the book up to date, and that the presentation of some of the material will be more clear.<br />
<br />
According to <a href="http://www.amazon.com/Computer-Systems-Programmers-Perspective-3rd/dp/013409266X/ref=sr_1_2?ie=UTF8&qid=1423661266&sr=8-2&keywords=bryant+o+hallaron">Amazon</a>, the book will be available starting March 11.<br />
<br />
Here are some chapter-by-chapter highlights:<br />
<ul>
<li>Ch. 2 (Data): After hearing many students saying ``It's too hard!'' we took a closer look and decided that the presentation could be improved by more clearly indicating which sections should be treated as informal discussion and which should be studied as formal derivations (and possibly skipped on first reading). Hopefully, these guideposts will help the students navigate the material, without us reducing the rigor of the presentation.</li>
<li>Ch. 3 (Machine Programming): It's x86-64 all the way! The entire presentation of machine language is based on x86-64. Now that even cellphones run 64-bit processors, it seemed like it was time to make this change. Eliminating IA32 also freed up space to put floating-point machine code back in (it was present in the 1st edition and moved to the web for the 2nd edition). We generated a web aside describing IA32. Once students know x86-64, the step (back) to IA32 is fairly simple.</li>
<li>Ch. 4 (Architecture): Welcome to Y86-64! We made the simple change of expanding all of the data widths to 64 bits. We also rewrote all of the machine code to use x86-64 procedure conventions.</li>
<li>Ch. 5 (Optimization): We brought the machine-dependent performance optimization up to date based on more recent versions of x86 processors. The web aside on SIMD programming has been updated for AVX2. This material becomes even more relevant as industry looks to the SIMD instructions to juice up performance.</li>
<li>Ch. 7 (Linking): Linking as been updated for x86-64. We expanded the discussion of position-independent code and introduce library inter positioning.</li>
<li>Ch. 8 (Exceptional Control Flow): We have added a more rigorous treatment of signal handlers, including signal-safe functions.</li>
<li>Ch. 11 (Network Programming): We have rewritten all of the code to use new libraries that support protocol-independent and thread-safe programming.</li>
<li>Ch. 12 (Concurrent Programming): We have increased our coverage of thread-level parallelism to make programs run faster on multi-core processors.</li>
</ul>
Randy Bryanthttp://www.blogger.com/profile/04800922504839845345noreply@blogger.com16tag:blogger.com,1999:blog-634113104715655552.post-47457580868572046072014-06-13T13:42:00.001-04:002014-11-18T15:35:24.283-05:00Third edition in the worksWe've gotten started on the third edition of CS:APP. The biggest change will be that we will shift entirely to 64 bits. It seems like that shift has finally occurred across most systems, and so we can say goodbye to 32-bit systems.<br />
<br />
Here's a summary of the planned changes for each chapter.<br />
<ol>
<li>Introduction. Minor revisions. Move the discussion of Amdahl's Law to here, since it applies across many aspects of computer systems</li>
<li>Data. Do some tuning to improve the presentation, without diminishing the core content. Present fixed word size data types.</li>
<li>Machine code. A complete rewrite, using x86-64 as the machine language, rather than IA32. Also update examples based on more a recent version of GCC (4.8.1). Thankfully, GCC has introduced a new opimization level, specified with the command-line option `-Og' that provides a fairly direct mapping between the C and assembly code. We will provide a web aside describing IA32.</li>
<li>Architecture. Shift from Y86 to y86-64. This includes having 15 registers (omitting %r15 simplifies instruction encoding.), and all data and addresses being 64 bits. Also update all of the code examples to following the x86-64 ABI conventions.</li>
<li>Optimization. All examples will be updated (they're mostly x86-64 already).</li>
<li>Memory Hierarchy. Updated to reflect more recent technology.</li>
<li>Linking. Rewritten for x86-64. We've also expanded the discussion of using the GOT and PLT to create position-independent code, and added a new section on the very cool technique of library interpositioning. </li>
<li>Exceptional Control Flow. More rigorous treatment of signal handlers, including async-signal-safe functions, specific guidelines for writing signal handlers, and using <kbd>sigsuspend</kbd> to wait for handlers.</li>
<li>VM. Minor revisions. </li>
<li>System-Level I/O. Added a new section on files and the file hierarchy.</li>
<li>Network programming. Protocol-independent and thread-safe sockets programming using the modern <kbd>getaddrinfo</kbd> and <kbd>getnameinfo</kbd> functions, replacing the obsolete and non-reentrant <kbd>gethostbyname</kbd> and <kbd>gethostbyaddr</kbd> functions. </li>
<li>Concurrent programming. Enhanced coverage of performance aspects of parallel multicore programs.</li>
</ol>
The new edition will be available March, 2015. Randy Bryanthttp://www.blogger.com/profile/04800922504839845345noreply@blogger.com22tag:blogger.com,1999:blog-634113104715655552.post-51414528514747230352013-03-27T16:26:00.001-04:002013-03-27T16:29:46.635-04:00Updated the CS:APP Bomb LabWe've just released an update to the <a href="http://csapp.cs.cmu.edu/public/labs.html">Bomb Lab</a> on the <a href="http://csapp.cs.cmu.edu">CS:APP site</a>. It fixes a bug caused by the fact that on some systems, <kbd>hostname()</kbd> doesn't return a fully qualified domain name.
Dave O'Hallaronhttp://www.blogger.com/profile/06635391504657170721noreply@blogger.com0tag:blogger.com,1999:blog-634113104715655552.post-57342153974221192992013-01-22T15:41:00.000-05:002013-01-22T15:41:19.448-05:00The CS:APP Cache LabWe've released a new lab, called the <a href="http://csapp.cs.cmu.edu/public/labs.html"><i>Cache Lab</i></a>, that we've been using at CMU in place of the Performance Lab for a few semesters. In this lab, students write their own general-purpose cache simulator, and then optimize a small matrix transpose kernel to minimize the number of cache misses. We've found that it really helps students to understand how cache memories work, and to clarify key ideas such as spatial and temporal locality.Dave O'Hallaronhttp://www.blogger.com/profile/06635391504657170721noreply@blogger.com3tag:blogger.com,1999:blog-634113104715655552.post-37387403262134543532012-11-12T10:29:00.001-05:002012-11-12T10:29:43.286-05:00Peking University Report<div style="text-align: center;">
<img height="200" id="il_fi" src="https://si0.twimg.com/profile_images/1399890412/PKU_LOGO.jpg" style="padding-bottom: 8px; padding-right: 8px; padding-top: 8px;" width="200" /> </div>
<div style="text-align: left;">
I just returned from a trip to Peking University (PKU). They have recently adopted CS:APP as the textbook for their course "Introduction to Computer Systems," (ICS) patterned after the course we teach at <a href="http://www.cs.cmu.edu/~213">CMU</a>, (the course for which CS:APP was originally written.)</div>
<div style="text-align: left;">
<br /></div>
<div style="text-align: left;">
They now require ICS for all CS majors. Moreover, as part of an initiative by the president of the university, they are teaching it in a form where they have the usual lectures, but they also hold weekly recitation instructions taught by faculty members. It is one of six courses being taught across the entire university in this format this term. Here are some statistics for this term:</div>
<div style="text-align: left;">
<br /></div>
<ul>
<li>167 students</li>
<li>14 recitations sections (12 students each)</li>
<li>14 faculty doing recitations</li>
<li>8 faculty doing lectures</li>
</ul>
That's a lot of resources to devote to a single course! Randy Bryanthttp://www.blogger.com/profile/04800922504839845345noreply@blogger.com1tag:blogger.com,1999:blog-634113104715655552.post-89985501077098064342012-06-11T09:46:00.001-04:002012-06-11T09:46:44.113-04:00Chinese Translations of CS:APP<div class="separator" style="clear: both; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhFnoNIT6SjbYmf_OmUQoaYm2K4EmzY_eq23KvGDJl_9md78vZNi_0XY6f5sA5NCs6v_GfsZHaXXmaJGrsPw2GFDPlVXWLtGR6rUQzWCatKBb8Zn0kmZK5N5Xz_I6W0Cj9cETacKLylG_EU/s1600/csapp2-chinese-large.jpg" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" height="320" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhFnoNIT6SjbYmf_OmUQoaYm2K4EmzY_eq23KvGDJl_9md78vZNi_0XY6f5sA5NCs6v_GfsZHaXXmaJGrsPw2GFDPlVXWLtGR6rUQzWCatKBb8Zn0kmZK5N5Xz_I6W0Cj9cETacKLylG_EU/s320/csapp2-chinese-large.jpg" width="225" /></a></div>
<br />
<br />
In a <a href="http://csappbook.blogspot.com/2012/05/who-reads-csapp.html">recent blog post</a>, I noted that 52% of all copies of the CS:APP that have been sold were in Chinese. <a href="http://www.yiligong.org/">Prof. Yili Gong</a> of <a href="http://en.whu.edu.cn/">Wuhan University</a> did the translations for both the first and second editions of the book. Prof. Gong has also been a valuable contributor to our <a href="http://www.csapp.cs.cmu.edu/public/errata.html">errata</a>.<br />
<br />
I recently came back from a trip to China, where I gave lectures about CS:APP at both <a href="http://english.pku.edu.cn/">Peking University</a> and <a href="http://www.tsinghua.edu.cn/publish/then/">Tsinghua University</a>, both of which use the book in their courses. Looking at our <a href="http://csapp.cs.cmu.edu/public/adoptions.html">adoptions list</a>, there are only 8 universities in China that we know of using CS:APP as a textbook. Apparently, the vast majority of copies sold in China are being used by individuals for self study.Randy Bryanthttp://www.blogger.com/profile/04800922504839845345noreply@blogger.com1tag:blogger.com,1999:blog-634113104715655552.post-57136735142878845572012-05-30T16:35:00.000-04:002012-05-30T16:35:10.216-04:00Who Reads CS:APP?I gathered some data on the total sales of the various versions of CS:APP. It's now in its second edition, and it has appeared in multiple languages:<br />
<ul>
<li>English. Including versions published in India (1st edition only) and China (1st and 2nd edition) for readers in those two countries</li>
<li>Chinese (1st and 2nd edition)</li>
<li>Korean (2nd edition)</li>
<li>Russian (1st edition) </li>
<li>Macedonian (1st edition)</li>
</ul>
All told, as of Dec. 31, 2011, a total of 116,574 books have been sold, across all editions, versions, and formats (paperback, hardcopy, e-book). The following pie chart shows how this divides across the language categories (sorry, no statistics on Macedonian, but I imagine the numbers are fairly small):<br />
<br />
<div class="separator" style="clear: both; text-align: center;">
</div>
<div class="separator" style="clear: both; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEi0rki1PyxrA_qmly9utu_pqzFVHizHMQg-k4hy2ECGQ74AgnKENRVhnnht3HR1qiN6cWxDmLLmrwVY_xNUN1H2qOB_l0hvJIjo78Y1p-CQVBx31iilxbimJl1MomJ_0E2owV7bXQJeH6k5/s1600/CSAPP-version-pie.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" height="232" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEi0rki1PyxrA_qmly9utu_pqzFVHizHMQg-k4hy2ECGQ74AgnKENRVhnnht3HR1qiN6cWxDmLLmrwVY_xNUN1H2qOB_l0hvJIjo78Y1p-CQVBx31iilxbimJl1MomJ_0E2owV7bXQJeH6k5/s320/CSAPP-version-pie.png" width="320" /></a></div>
<br />
<br />
<br />
<br />
One thing that's clear is that we're very popular in China: fully 52% of the total has been in Chinese, and another 15% has been the English version for the Chinese market.<br />Randy Bryanthttp://www.blogger.com/profile/04800922504839845345noreply@blogger.com3tag:blogger.com,1999:blog-634113104715655552.post-27652177455231951662012-05-17T17:32:00.001-04:002012-05-17T17:43:27.750-04:00Update to the Bomb LabWe've updated the Bomb Lab sources on the <a href="http://csapp.cs.cmu.edu/public/labs.html">CS:APP site</a> to address a problem that arises when students from previous semesters run their old bombs while the current instance of the lab is underway.<br />
<br />
The Bomb Lab servers assign diffusions and explosions to Bomb IDs, rather than users, and Bomb IDs start over from scratch each term. Thus, if a student who took the class last semester ran their old bomb while the lab was as underway this semester, then the explosions and diffusions from the old bomb would be incorrectly assigned to the current bomb with the same Bomb ID.<br />
<br />
To address this, we've added a per-semester identifier, called $LABID, to the Bomb Lab config file. Instructors can set this variable each term (for example $LABID="f12") to uniquely identify each offering. Any results from previous bombs with different $LABIDs are ignored.<br />
<br />
Thanks for Prof. Godmar Bak, Virginia Tech, for pointing this out.<br />
<br />Dave O'Hallaronhttp://www.blogger.com/profile/06635391504657170721noreply@blogger.com2tag:blogger.com,1999:blog-634113104715655552.post-49775717155601029872012-04-23T15:39:00.003-04:002012-04-23T15:45:08.165-04:00Update to Buffer LabWe've uploaded another update to the <a href="http://csapp.cs.cmu.edu/public/labs.html">Buffer Lab</a> to fix a couple of issues.<br />
<br />
(1) Some recent gcc builds automatically enable the -fstack-protector option. We now explicitly disable this by compiling the buffer bomb with -fno-stack-protector.<br />
<br />
(2) In order to avoid infinite loops during autograding, the previous update from February 2012 introduced a timeout feature that was always enabled. However, this was a problem for students who were debugging their bombs in gdb. We now enable timeouts only during autograding.<br />
<br />
Thanks to Prof. James Riely, DePaul University for pointing these out to us.Dave O'Hallaronhttp://www.blogger.com/profile/06635391504657170721noreply@blogger.com0tag:blogger.com,1999:blog-634113104715655552.post-36474693703717776172012-02-21T14:32:00.004-05:002012-02-21T14:39:56.088-05:00Updated Buffer LabWe've updated the <a href="http://csapp.cs.cmu.edu/public/labs.html"> Buffer Lab</a> on the CS:APP site to be more portable, more robust to infinite loops in student exploits, and more random during the nitro phase. Thanks to Prof. Godmar Back from Virginia Tech for identifying the issues and helping us with the solution.Dave O'Hallaronhttp://www.blogger.com/profile/06635391504657170721noreply@blogger.com0