SDRAM can be a bit difficult to work with, and unless you have the tools, ability, and time to debug things then it is not a good idea to use what probably won't work. And by "tools", I mean an o-scope + probes that goes up to at least 350 MHz (1 GHz is preferred) and maybe a logic analyzer. "Ability" means that you know the SDRAM protocol inside and out. And "time" means at least several weeks to figure things out. You might also have difficulty probing the BGA pins, too.
You cannot "overpower" an SDRAM, since that'll break it. You need to convert the 3.3v signals to 1.8v, and vice versa, and you need to do it cleanly. Basically, you need buffers that will do the translation (and match the signaling specification as well). To make matters worse, some of these buffers must be bidirectional and they eat into your timing budget. If your clock rate is slow enough you might get away with this. Designing something like this requires at least an "intermediate engineer". A newbe engineer just isn't going to cut it, given all the complex timing, termination, and control that needs to be done for the buffers.
Given that, the correct 3.3v SDRAM might not be looking all that expensive after all! :) There is something to be said about using the right parts in the right places.
The main advantage of synchronous design is that it's behavior is easy to predict, model, and validate because everything happens on a predefined schedule. However, waiting for a specified time to perform an action makes synchronous design slower than a comparable asynchronous design. And even when the circuit is not responding to its logic inputs, it is still drawing power since it is responding to the clock signal.
An asynchronous circuit can be much faster because it responds to its inputs as they change. No waiting around for a clock signal before processing can take place. They also can take less power since they don't have anything to do when the inputs are inactive and have better EMI performance since there isn't a constant digital signal floating around. But the design of such systems is much more difficult because all combinations of inputs over time need to be taken into consideration to ensure proper operation of the circuit. When two inputs change at almost the same time, this is called a race condition and the circuit can have undefined behavior if the designer didn't plan for every combination of inputs at every combination of time.
Comparing and contrasting synchronous to asynchronous design, you're probably thinking that big companies like Samsung can spend billions on the research and design to fully model a DRAM circuit so that its operation is really stable and then we would have really fast, really low power memory. So why is SDRAM so much more popular?
While asynchronous design is faster than synchronous in sequential operations, it is much much easier to design a circuit to perform parallel or simulations operations if the operations are synchronous. And when many operations can be performed at the same time, the speed advantage of asynchronous design disappears.
So three main things to consider when designing a RAM circuit are speed, power, and ease of design. SDRAM wins over plain DRAM on two out of three of those and by a very large margin.
Wikipedia quotes:
Dynamic random-access memory -
The most significant change, and the primary reason that SDRAM has
supplanted asynchronous RAM, is the support for multiple internal
banks inside the DRAM chip. Using a few bits of "bank address" which
accompany each command, a second bank can be activated and begin
reading data while a read from the first bank is in progress. By
alternating banks, an SDRAM device can keep the data bus continuously
busy, in a way that asynchronous DRAM cannot.
Synchronous dynamic random-access memory -
Classic DRAM has an asynchronous interface, which means that it
responds as quickly as possible to changes in control inputs. SDRAM
has a synchronous interface, meaning that it waits for a clock signal
before responding to control inputs and is therefore synchronized with
the computer's system bus. The clock is used to drive an internal
finite state machine that pipelines incoming commands. The data
storage area is divided into several banks, allowing the chip to work
on several memory access commands at a time, interleaved among the
separate banks. This allows higher data access rates than an
asynchronous DRAM.
Pipelining means that the chip can accept a new
command before it has finished processing the previous one. In a
pipelined write, the write command can be immediately followed by
another command, without waiting for the data to be written to the
memory array. In a pipelined read, the requested data appears after a
fixed number of clock cycles after the read command (latency), clock
cycles during which additional commands can be sent.
Best Answer
The "by 4 bits/by 8 bits/..." gives the word size - there are twice as many words in the 8 bit per word mode as there are in the 16 bit per word mode.