Friday, August 5, 2011

PEDAGOGY: Why Teach x86 rather than RISC?

I am a lapsed member of the Church of RISC.  When I first started teaching computer architecture courses in the late 1970s (at MIT & Caltech), the prevailing wisdom was that machines should “close the semantic gap,” meaning to have instruction sets that closely matched the constructs used in high-level programming languages.  We would wax poetic about machines such as the Burroughs B6700, which directly executed Algol programs, and the Digital Equipment Corporation VAX, which had special instructions for such operations as array indexing (with bounds checking) and polynomial evaluation.  Surprisingly, there was little discussion in those days about maximizing performance; there was not even an accepted standard for measuring performance, such as the SPEC benchmarks.
In 1982, I heard a presentation about the principles of Reduced Instruction Set Computers (RISC) from David Patterson.  It was a real eye-opener!  He pointed out that all that business about closing the semantic gap was implemented using microcode, which simply added an unnecessary layer of interpretation between the software and the hardware.  Advanced compilers could do a much better job of taking advantage of special cases than could a general-purpose microcode interpreter.  I returned from that talk and told the students in my computer architecture course that I felt like I’d be teaching the wrong material.

When I started teaching computer architecture at CMU, to both our PhD students and to CS undergraduates, I fully embraced the RISC philosophy, partly because of the then-new textbook, Computer Architecture: A Quantitative Approach, by Patterson & Hennessy.  We made use of a set of MIPS-based machines available to the students, and later a set of Alphas, provided by Digital Equipment Corporation.  Being able to compile and execute C programs on actual machines proved to be an important aspect of the courses.  Needless to say, I was a true believer in RISC, and I would scoff at x86 as a big bag of bumps and warts with all of its addressing models, weirdly named registers, and truly icky floating-point architecture.

As mentioned in our earlier post, the initial offering of 15-213, our Introduction to Computer Systems course, made use of our collection of Alpha machines.  But, we could see that we were on a dead-end path with these machines. In spite of their clean design and initial high performance, Alphas did not fare well in the marketplace.  The steady progress by Intel, first with the Pentium and then with the PentiumPro, slowly took over the market for desktop machines, including for high-end engineering workstations.  We also were thinking at that point about writing a textbook, to encourage others to teach about computer systems from a programmer’s perspective, and so we wanted to find a platform that would be widely available.  We considered both SUN Sparc and IBM/Motorola PowerPC, but these had their own funky features (register windows, branch counters), and also lacked the universality of x86.

As an experiment, I tried compiling some of the C programs we had used to demonstrate the constructs found in machine-level programs on a Linux Pentium machine.  Much to my surprise, I discovered that the assembly code generated by GCC wasn’t so bad after all.  All of those different addressing models didn’t really matter---Linux uses “flat mode” addressing, which avoids all the weird segmentation stuff.  The oddball instructions for decimal arithmetic and string comparison didn’t show up in normal programs.  Floating-point code was pretty ugly, but we could simply avoid that.  Moreover, it didn’t take a particularly magical crystal ball to see that x86 was going to be the dominant instruction set for the foreseeable future.

So, for the second offering of 15-213 in Fall, 1999, Dave O’Hallaron and I decided to make a break from the RISC philosophy and go with x86 (or more properly IA32 for “Intel Architecture 32 bits”).  This turned out to be one of the best decisions we ever made.  Students could use any of the Linux-based workstations being deployed on campus.  By installing the Cygwin tools, they could even do much of the work for class on their Windows laptops.  The feeling of working with real code running on real machines was very compelling.  When we then went to write Computer Systems: A Programmer’s Perspective, we were certain that x86 was the way to go.  Now that Apple Macintosh has transitioned to Intel processors, there are really 3 viable platforms for presenting x86.

One thing we learned is that every machine has awkward features that students must learn if they are going to look at real machine programs.  Here are some examples:
  • In MIPS, you cannot load a 32-bit constant value with a single instruction.  Instead, you load two 16-bit constants, first using the lui instruction to load the upper 16 bits, and then an addi instruction to add a constant to the lower 16 bits.  With a byte-oriented instruction set, such as x86, constants of any length can be encoded within a single instruction.
  • C code compiled to MIPS uses the addu (unsigned add) instruction for adding signed numbers, since the add (signed add) instruction will trap on overflow.
  • With the earlier Alphas, there was no instruction to load or store a single byte.  Loading required a truly baroque pair of instructions: ldq_u (load quadword unaligned) and extbl (extract byte low), followed by two shifts to do a sign extension.  This is all done in x86 with a movb (move byte) instruction, followed by a movsbl (move signed byte to long) to do the sign extension.

My point here isn’t that x86 is superior to a RISC instruction set, but rather that all machines have their bumps and warts, and that’s part of what students need to learn.

In the end, I think the choice of teaching x86 vs. a cleaner language to a computer scientist or computer engineer is a bit like teaching English, rather than Spanish, to someone from China.  Spanish is a much cleaner language, with predictable rules for how to pronounce words, far fewer irregularities, and a smaller vocabulary.  It’s even useful for communicating with many other people, just as learning MIPS (or better yet, ARM) would be for programming embedded processors.  But, as English is the main language for commerce and culture in this world, so x86 is the main language for machines that our students are likely to actually program.  Like Chinese parents who send their children to English-language school, I’m content teaching my students x86!

Randy Bryant

2 comments:

  1. I like the analogy. I don't mind that the book is x86 (in fact it has made me more marketable) but I've always been curious about the thought process behind the choice since many undergrad texts are more RISC-based in their approach. Well, now I know. Thanks for the wonderful book and companion site.

    ReplyDelete
  2. Hi Randy,

    Fully support your rationale behind x86 -- and the stripped down y86 is an excellent 'wrapper'.

    I'm curious to know whether you've ever considered Forth as a pedagogically useful language to present?

    ReplyDelete