Given the goals of the class, I think the TTL approach is fine, and I say this as an "FPGA guy". FPGAs are a sea of logic and you can do all sorts of fun stuff with them, but there's only so much that's humanly possible to do in a semester.
Looking at your syllabus, your class is a mix of the logic design and "machine structures" courses I took in undergrad. (Plus, it's for CS majors. I'm all for CS majors having to face real hardware--letting them get away with writing code seems like a step back.) At this introductory level, where you're going over how assembly instructions are broken down, I see no real benefit to having students do things in code versus by hand. Doing HDL means learning the HDL, learning how to write synthesizable HDL, and learning the IDE. This is a lot more conceptual complexity and re-abstraction. Plus you have to deal with software issues.
Generally the point of a course that uses FPGAs is to practice creating logic that is useful--useful for talking to peripherals, serial comms, RAM, video generators, etc. This is valuable knowledge to have, but it seems very much out of the scope of your course. More advanced classes in computer architecture have students implement sophisticated CPUs in FPGAs, but again, this seems out of the scope of your course.
I would at the very least devote a lecture to FPGAs. Run through a few demos with a dev board and show them the workflow. Since you're at Mills, perhaps you could contact the folks at Berkeley who run CS150/152 and go see how they do things.
It wouldn't speed them up. Now it's easy: to make a basic logic gate like a NAND the logic inputs either pull the output to Vdd or to ground. If you would use intermediate levels you would need FETs to go to levels like Vdd/2 or Vdd/4. This would consume more power, and would require more accurately working components, which would need more time to settle to the final level. If you would stuff more values in a single data unit the required accuracy would increase, as would settling time. The binary system used now just pushes the FET hard to Vcc.
exscape mentions noise immunity, and that's what the accuracy refers to: how much may the signal deviate from nominal. In a binary system that may be almost 50 %, or more than 0.5 V in a 1.2 V processor. If you use 4 different levels they're only 300 mV apart, then noise immunity can't be better than 150 mV, possible 100 mV.
Note that there are Flash devices which use multiple levels to store more than 1 bit in a single memory cell, that's MLC (Multi-Level Cell) Flash. That doesn't increase speed, but packs more data on a single chip.
Best Answer
The main reason is that it's simply a lot easier to make circuitry that is always in one of two states than to have it support in-between states. The extra complexity, cost, and speed penalty for compressing more states into a single signal outweigh any advantage gained by the compression.
One important convenience of using only two states is that any signal can be arbitrarily amplified about the middle. This results in the amplifier output slamming to one extreme or the other. The gain can therefore vary widely, and can be made arbitrarily large.
Imagine a human analog of this. If you have a light switch on the wall that is either on or off, you can whack it to put it in the other state. It doesn't matter if you are still pushing on it a bit when it gets there, since it has a mechanical limit built in. You can push on it just enough to make it switch, or a lot more as long a you don't physically break it. Now imagine if the switch had 3 or more states and you wanted to set it to one of the in-between states. You'd have to be a lot more careful to apply just the right amount of force or travel. Too much and you end up in the next state. You can't just do the simple and fast thing of whacking it anymore.
A similar complexity is required to set the level of a signal to a in-between state. This costs parts, power, and takes time. Then you have more complexity again to interpret the signal when you want to use its value. This could be done, but is not worth it.
Another issue is that keeping a signal at a in-between level would likely take more power. With a high or low signal, you can think of the signal being connected to power or ground thru one of two switches. These don't take power to keep fully on or fully off, but any circuit to keep a signal in-between doesn't have that benefit and would very likely require constant standby power to keep it that way.
There are actually cases where more than two levels are used today to encode digital data. There are some bulk flash memories that work on this principle. Data is stored in piles of charge. These piles can have more than 2 sizes. It does take extra complexity to decode the size of the piles when a read is performed, but in the case of large flash memories that extra complexity is spent only a few times in read circuitry while the compression savings is applied to many millions of bits, so the tradeoff is worth it.