For most RAM, the mux turns into a column select for a large grid, and an entire line of data is read out or written at once. The grid structure gives you the efficiency you need.
http://docencia.ac.upc.edu/master/DTM/docs/03-Memory%20Structures.pdf is a good PDF on the subject. It also shows a "predecode" structure which makes the decoder less complex than it would be otherwise.
The main reason is that it's simply a lot easier to make circuitry that is always in one of two states than to have it support in-between states. The extra complexity, cost, and speed penalty for compressing more states into a single signal outweigh any advantage gained by the compression.
One important convenience of using only two states is that any signal can be arbitrarily amplified about the middle. This results in the amplifier output slamming to one extreme or the other. The gain can therefore vary widely, and can be made arbitrarily large.
Imagine a human analog of this. If you have a light switch on the wall that is either on or off, you can whack it to put it in the other state. It doesn't matter if you are still pushing on it a bit when it gets there, since it has a mechanical limit built in. You can push on it just enough to make it switch, or a lot more as long a you don't physically break it. Now imagine if the switch had 3 or more states and you wanted to set it to one of the in-between states. You'd have to be a lot more careful to apply just the right amount of force or travel. Too much and you end up in the next state. You can't just do the simple and fast thing of whacking it anymore.
A similar complexity is required to set the level of a signal to a in-between state. This costs parts, power, and takes time. Then you have more complexity again to interpret the signal when you want to use its value. This could be done, but is not worth it.
Another issue is that keeping a signal at a in-between level would likely take more power. With a high or low signal, you can think of the signal being connected to power or ground thru one of two switches. These don't take power to keep fully on or fully off, but any circuit to keep a signal in-between doesn't have that benefit and would very likely require constant standby power to keep it that way.
There are actually cases where more than two levels are used today to encode digital data. There are some bulk flash memories that work on this principle. Data is stored in piles of charge. These piles can have more than 2 sizes. It does take extra complexity to decode the size of the piles when a read is performed, but in the case of large flash memories that extra complexity is spent only a few times in read circuitry while the compression savings is applied to many millions of bits, so the tradeoff is worth it.
Best Answer
I studied optical computing in undergrad. From that, certain categories of problems would be solved MUCH faster. For instance, do some research on 4-f optical correlators. These kinds of operations can perform at the propagation velocity of light through the media (so essentially the speed of light).
In terms of computing as we know it (logic gates, stateful gates which hold charge interpreted as '1' or '0'), I do not know if the advantages are there.
That said, consider the economic momentum of "traditional" computer development with silicon chip fabrication. It's about 50+ years or so old. So (like other technologies), until any other technology is cheaper or in more demand, it will be a while until optical computing becomes a commodity.
In any case, it's as cool as heck!