Which is better? Well, I can't tell you. There are a lot of factors to consider, and many of them are not technical. Here are some things to consider:
- Yes, comparing signal routing and which signals get shared between chips is important. If only considering signal routing then I would prefer to use two x16 chips. There are issues to resolve with this, but it is easier than with two x32 chips. Either way, with x16 or x32, you must know how to properly route and terminate signals to get proper operation.
- Consider your purchasing of parts. x16 parts are usually more common and are easier to get from a variety of sources. This means that in 5 or 10 years when you need to make another batch of boards, you will have an easier time getting the x16 parts. Also, the x16 parts are usually cheaper because they might come in a smaller package and the distributors tend to sell more of them. But these factors might not matter to you. Maybe you found a really good deal on x32 chips, or a 5-10 year lifespan is not required.
- Consider your upgrade path. Although you are using 4 GB today, you might want 8 GB or 16 GB in a year or two. Or maybe you don't need 8+ GB, but in a year or two it is cheaper to buy 8, even if you are only using 4. Or maybe the 4 GB solution has been discontinued and you have no choice. Choose a manufacturer and "chip series" that allows you some options in a pin-compatible package. Do your homework now, because it might require that you run a couple of extra signals from your CPU to your RAM now.
- Consider the manufacturer. Companies like Toshiba offer great chips, but don't like to sell to companies that will only buy several thousand chips per year. Micron is more expensive, but has better customer service for the small companies. Figure out who you are buying from, and why. This will limit your choices of RAMs and RAM organization.
- Consider the distributor. Your business relationship with your distys can often make the choice for you.
- Consider the make/model of CPU. For example, TI has a good relationship with Micron and will often have better application notes or better tech support if you are connecting your TI processor to Micron RAM.
- Talk with your sales guys and FAEs and see what they recommend based on where they think the market is going.
Here is the thing: Only you can weight the different factors and decide. Only you know the technical aspects of your product, and the ability of those designing it (a.k.a. YOU). Only you can assess which manufacturers and distributors you are willing to work with (or who will work with you). Only you can determine if having an upgrade path or long lifespans is important.
Technically, you can make (or should be able to make) either one work. The x16 is easier, in my opinion, but the x32 is a close second. But as I outlined, technical matters are only a small part of the story.
If you take another look at your image of the peaks, and specifically at the time scales listed there, you will see that the peak profiles happen in the range of something like 2.5 to 15ns.
Which makes sense, as DDR3 works at extreme speeds, at least when compared to power frequencies.
You are not going to be designing any supply system that will actively regulate in the domain of nano seconds. Nor will you get the peak currents all the way from your supply into the chips across a PCB trace.
Which is why you have decoupling capacitors that get picked for working with the frequencies those peaks have, at currents about twice the average. Because, no it doesn't give numbers, but I presume you can also see that the highest peak is about twice the average.
You then design your supply to be compatible with the average current as you would design any other power supply for which you know power consumption.
Best Answer
tRAS specifies the minimum and maximum window that a row can be ACTIVATEd for access. The maximum value is bounded by the limitations imposed by refresh.
Every row in the DRAM needs to be refreshed periodically. tREFI specifies the average interval between refresh operations (each operation refreshes one row). All rows need to be refreshed within some specified time (which is usually temperature dependent - hotter operation requires more frequent refresh to guarantee data integrity). This time might be 64ms for commercial temp range. Meaning, typically, that 8192 refresh operations need to occur within 64ms.
The datasheet allows you to defer a scheduled refresh up to 9 x tREFI (this can be helpful for performance, since you can keep accessing that row during that interval (instead of incurring the overhead of shutting down and re-opening the row). The timing parameters in the datasheet are very conservative. Meaning, if you don't violate them, data will never be corrupted. Overclockers take advantage of the conservative specmanship and violate these parameters, eating into the margin up until the point that it breaks, and then back off to their own level of comfort.
If the refresh operation is deferred, the controller needs to make up for that lost time somehow, such that the average refresh interval does not exceed tREFI. This can be done by issuing more refresh commands than necessary during idle times, but when things get busy, the controller will absolutely preempt access to keep refresh on track.
For DDR3, I think JEDEC specifies the amount of time debt as follows: controller must issue 8 refresh commands in 8*tREFI time window. In the limit, one could wait almost 8 full tREFIs, and then issue 8 refreshes in a row. It is similar in DDR4, although I think there are some new refresh features available there.