Friday, August 5, 2011

PEDAGOGY: How to Design an x86-like Processor

In switching from RISC to x86 with our Introduction to Computer Systems course, we had an advantage over traditional computer architecture courses in that we didn’t have to worry about how to actually implement a processor.  We could skip all the complexities of instruction formats and instruction encoding, which is definitely not pretty with x86.  When it came time to write our chapter on processor design (Chapter 4) of Computer Systems:  A Programmer's Perspective, we had to confront this unpleasant aspect of x86.  Here we did a bit of a sleight of hand, creating a new instruction set we called “Y86” (get it?).  The idea was to have a notation that looked like x86, but was simple to decode and could be executed by an in-order pipelined processor.  Thus, we limited the addressing modes and made it so that arithmetic instructions could only operate on registers, but we retained the stack-oriented procedure execution conventions.  We also used a very simple encoding.  This made it feasible to go through the entire implementation of two processors---one executing a complete instruction every clock cycle, and one based on a 5-stage pipeline---in a single chapter.  We could even generate complete a Verilog implementation of the pipelined processor and map it through synthesis tools onto an FPGA.  I’ll admit that this approach is a compromise from our goal of presenting real machines executing real programs, but it seems to have worked fairly well.

Interestingly, in their textbook Digitaltechnik—Eine praxisnahe Einführung (Digital Technology, a Practical Introduction), Armin Biere and Daniel Kroening present the design of a processor that executes a subset of the Y86 instructions using the actual IA32 encodings of the instructions.

Apparently, we weren’t the only ones to think of the name “Y86” as a variant of “x86.”  Randall Hyde introduces a stripped down 16-bit instruction set, which he names “Y86” in his book Write Great Code, published in 2004, several years after the first edition of CS:APP came out.

The domain names “” and “” are already taken, but it looks like they’re been occupied by a cybersquatter named Richard Strickland since 2003.  Perhaps he’s just waiting for us to buy him out!

Randy Bryant

PEDAGOGY: Why Teach x86 rather than RISC?

I am a lapsed member of the Church of RISC.  When I first started teaching computer architecture courses in the late 1970s (at MIT & Caltech), the prevailing wisdom was that machines should “close the semantic gap,” meaning to have instruction sets that closely matched the constructs used in high-level programming languages.  We would wax poetic about machines such as the Burroughs B6700, which directly executed Algol programs, and the Digital Equipment Corporation VAX, which had special instructions for such operations as array indexing (with bounds checking) and polynomial evaluation.  Surprisingly, there was little discussion in those days about maximizing performance; there was not even an accepted standard for measuring performance, such as the SPEC benchmarks.
In 1982, I heard a presentation about the principles of Reduced Instruction Set Computers (RISC) from David Patterson.  It was a real eye-opener!  He pointed out that all that business about closing the semantic gap was implemented using microcode, which simply added an unnecessary layer of interpretation between the software and the hardware.  Advanced compilers could do a much better job of taking advantage of special cases than could a general-purpose microcode interpreter.  I returned from that talk and told the students in my computer architecture course that I felt like I’d be teaching the wrong material.

When I started teaching computer architecture at CMU, to both our PhD students and to CS undergraduates, I fully embraced the RISC philosophy, partly because of the then-new textbook, Computer Architecture: A Quantitative Approach, by Patterson & Hennessy.  We made use of a set of MIPS-based machines available to the students, and later a set of Alphas, provided by Digital Equipment Corporation.  Being able to compile and execute C programs on actual machines proved to be an important aspect of the courses.  Needless to say, I was a true believer in RISC, and I would scoff at x86 as a big bag of bumps and warts with all of its addressing models, weirdly named registers, and truly icky floating-point architecture.

As mentioned in our earlier post, the initial offering of 15-213, our Introduction to Computer Systems course, made use of our collection of Alpha machines.  But, we could see that we were on a dead-end path with these machines. In spite of their clean design and initial high performance, Alphas did not fare well in the marketplace.  The steady progress by Intel, first with the Pentium and then with the PentiumPro, slowly took over the market for desktop machines, including for high-end engineering workstations.  We also were thinking at that point about writing a textbook, to encourage others to teach about computer systems from a programmer’s perspective, and so we wanted to find a platform that would be widely available.  We considered both SUN Sparc and IBM/Motorola PowerPC, but these had their own funky features (register windows, branch counters), and also lacked the universality of x86.

As an experiment, I tried compiling some of the C programs we had used to demonstrate the constructs found in machine-level programs on a Linux Pentium machine.  Much to my surprise, I discovered that the assembly code generated by GCC wasn’t so bad after all.  All of those different addressing models didn’t really matter---Linux uses “flat mode” addressing, which avoids all the weird segmentation stuff.  The oddball instructions for decimal arithmetic and string comparison didn’t show up in normal programs.  Floating-point code was pretty ugly, but we could simply avoid that.  Moreover, it didn’t take a particularly magical crystal ball to see that x86 was going to be the dominant instruction set for the foreseeable future.

So, for the second offering of 15-213 in Fall, 1999, Dave O’Hallaron and I decided to make a break from the RISC philosophy and go with x86 (or more properly IA32 for “Intel Architecture 32 bits”).  This turned out to be one of the best decisions we ever made.  Students could use any of the Linux-based workstations being deployed on campus.  By installing the Cygwin tools, they could even do much of the work for class on their Windows laptops.  The feeling of working with real code running on real machines was very compelling.  When we then went to write Computer Systems: A Programmer’s Perspective, we were certain that x86 was the way to go.  Now that Apple Macintosh has transitioned to Intel processors, there are really 3 viable platforms for presenting x86.

One thing we learned is that every machine has awkward features that students must learn if they are going to look at real machine programs.  Here are some examples:
  • In MIPS, you cannot load a 32-bit constant value with a single instruction.  Instead, you load two 16-bit constants, first using the lui instruction to load the upper 16 bits, and then an addi instruction to add a constant to the lower 16 bits.  With a byte-oriented instruction set, such as x86, constants of any length can be encoded within a single instruction.
  • C code compiled to MIPS uses the addu (unsigned add) instruction for adding signed numbers, since the add (signed add) instruction will trap on overflow.
  • With the earlier Alphas, there was no instruction to load or store a single byte.  Loading required a truly baroque pair of instructions: ldq_u (load quadword unaligned) and extbl (extract byte low), followed by two shifts to do a sign extension.  This is all done in x86 with a movb (move byte) instruction, followed by a movsbl (move signed byte to long) to do the sign extension.

My point here isn’t that x86 is superior to a RISC instruction set, but rather that all machines have their bumps and warts, and that’s part of what students need to learn.

In the end, I think the choice of teaching x86 vs. a cleaner language to a computer scientist or computer engineer is a bit like teaching English, rather than Spanish, to someone from China.  Spanish is a much cleaner language, with predictable rules for how to pronounce words, far fewer irregularities, and a smaller vocabulary.  It’s even useful for communicating with many other people, just as learning MIPS (or better yet, ARM) would be for programming embedded processors.  But, as English is the main language for commerce and culture in this world, so x86 is the main language for machines that our students are likely to actually program.  Like Chinese parents who send their children to English-language school, I’m content teaching my students x86!

Randy Bryant

Tuesday, August 2, 2011

Origins: The Rollout of Introduction to Computer Systems

Dave O’Hallaron and I taught the first version of 15-213 Introduction to Computer Systems in the Fall of 1998.  It was a great experience right from the start., with the students and the teaching staff feeling like we were pioneers in a new way of teaching and learning about computer systems.  We could feel that we had hit a sweet spot in selecting material that the students found interesting and that would get them ready for later courses.  We still have the course web pages online.

What really made a difference in 15-213 was our ability to present interesting and engaging lab exercises, all done on computers.  We had a set of Alpha 21164 processors (Digital Equipment Corporation---may they rest in peace---was always a great friend to CMU) that the students accessed over the network.  Some of them were connected together via a separate Ethernet cable so that we could allow students to snoop packets in promiscuous mode.

Here are some of labs we offered that fall:

  • ·Data Lab.  A set of “puzzles” that require implementing standard logical and arithmetic operations with a restricted set of C expressions.  For example, compute the absolute value of a number without using any conditionals.
  • ·Bomb Lab.  This was the invention of our teaching assistant, Chris Colohan.  It involved reverse engineering an executable program, given in binary form, and devising a set of strings that would “defuse” six different phases.  This lab continues to be the centerpiece of the course.  It gets students to learn about machine-level programming, the use of tools such as GDB, and the general strategies of reverse engineering.
  • Malloc Lab.  Students implement their own malloc packages.  This lab has also stood the test of time. The challenge for most students is that all the casting and pointer hacking involved means that many bugs  are not caught by the compiler, and tracking down bugs can be very difficult.
  • Performance Lab.  Students write programs and both analyze and optimize their cache performance.  For this lab, we used matrix transpose as the problem to be solved
  •  Network Lab.  The students reverse-engineered a simple network protocol by sniffing packets.  It was fun to finally figure out the packet format and suddenly have messages (from “Dr. Evil”) coming through.
Over the years, we’ve added new labs and removed old ones, but the basic spirit of learning about computer systems by programming them and running tools on them still persists.

Instructors for our upper-level systems courses have come to appreciate the preparation that 15-213 provides.  Dave Eckhardt, one of our OS instructors, says that he can reliably predict how well a student will do in their course based on how they did with the Malloc lab.  15-213 has become a prerequisite for courses in operating systems, networking, compilers, computer graphics (they want students to understand floating point), embedded systems, and computer architecture.  The course is now required of all CS and ECE majors.

One sign of our success is the course ratings.  Here’s my average scores for “instructor effectiveness” on a five-point scale:

Not bad!  I went from being an under-performer with respect to the rest of the department to one who ranked near the top.  Not only did the students like the material and the labs, I also found it more fun to teach the course than was the case with computer architecture, and the students responded well to my enthusiasm.

I have now taught the course eleven times, and I still really enjoy it, as do the students.  Dave has also taught the course many times, sometimes with me and sometimes with other instructors.  He received the CMU School of Computer Science’s Herbert A. Simon Award for Teaching Excellence in Computer Science in 2004, based largely on our students’ appreciation of his efforts in teaching 15-213.

Randy Bryant

Origins: Designing A New Computer Systems Course

As I described in an earlier post, I had been teaching computer architecture to computer science students for many years without much success.  Students did not share my innate fascination with how computer systems are designed.

Meanwhile, at one of our faculty lunches, Garth Gibson described the challenges he had in his operating systems course with the students’ lack of understanding of how programs are executed.  He would say “To do a context switch, the OS needs to push the values of the registers onto the stack,” to which the students responded “Registers? Stack?  What are those?”  I told Garth that the students learned all that in my architecture course, but we realized that my course was not a prerequisite for operating systems, and it wouldn’t really work to make it so, in terms of student schedules.

So, Dave O’Hallaron (who had cotaught the computer architecture course with me) and I started thinking about a new course that would

  1. Provide a programer’s perspective, rather than a computer architect’s perspective on computers systems, and 
  2. would come early enough in the curriculum that it could then feed into our systems courses, including OS.

Origins: Teaching Computer Architecture to Computer Scientists

I have taught computer architecture courses at both Caltech and CMU, at both the graduate and undergraduate level.  I’ve always found it a fascinating subject---trying to match the needs of programs with the ever-changing capabilities of hardware, while mixing in a bit about optimizing compilers and operating systems along the way.

At CMU, I used to teach a junior-level computer architecture course that was required of all CS majors.   I really liked showing how to construct a 5-stage pipeline to implement a processor that could execute MIPS code, as well as nuances of cache design, virtual memory, and data storage.  Unfortunately, the students did not share my enthusiasm.  They were much more oriented toward software, and, in their minds, this course formed a not-very-useful, dead-end branch in our curriculum.

Welcome to the CS:APP Blog

We published the first edition of Computer Systems: A Programmer's Perspective (also known as “CS:APP”) in 2002.  The second edition came out in 2010.  Over the years, we've developed a very loyal and engaged community of students, instructors, and computing professionals who use the book in classes or for independent study.  They communicate with us regularly with error reports (see our errata), questions, and suggestions.  Our web page also includes extensive material for both students and instructors, including lecture notes, supplementary readings, and a set of exciting and useful labs.

We are creating this blog as a way to further build and support the CS:APP community.  In this blog, we will post interesting stories, updates on the book contents and extra material, and our experiences in using this book in courses at CMU.

We welcome your suggestions, comments, and feedback!

Randal E. Bryant
David R. O'Hallaron