The clocks are used as follows:
pll_ref_clk
= The clock which is fed in to the input of the PLL. This has to match whatever setting you have in the IP core so that the correct memory clock frequency will be generated.
afi_clk
= This is the clock used for the Avalon-MM interface. There is a corresponding reset signal afi_reset
which is synchronous (on Deassert I think) with afi_clk
.
afi_half_clk
= This is optional and is enabled by a parameter on the IP core. Basically this is a clock synchronous to afi_clk
but at half the frequency.
So when you are interfacing the Avalon-MM interface you should be using the afi_clk
for your logic.
The frequency of the clocks is dependent on your IP core settings. Basically you specify the memory frequency which is the clock speed used for the physical memory. There is then a parameter which allows you to select a mode of operation for the Avalon-MM interface. You have the following options:
Full Rate
= The Avalon-MM interface operates at the same frequency as the memory. For DDR RAM, you have data clocked on both edges, but internally you cannot do that, so to achieve the same data rate, the output interface is twice the width of the memory.
Half Rate
= The clock frequency of the Avalon-MM interface is half the frequency of the memory. Consequently the output interface is now quadruple the width of the memory.
Quarter Rate
= You get the pattern by now. Quarter the clock frequency, so data width is 8 times the width of the memory.
These different rate settings allow you to run the memory much faster than the core logic of the FPGA can run at (the periphery of the FPGA is much faster than the core). So for example you can have a 32bit 800MHz DDR chip which presents itself internally as a 200MHz 256bit wide data bus.
As to why they were using the PLL reference clock for the Nios processor in the reference design, who knows. There are many little mistakes here and there that you will find with the documentation and example designs from Altera.
The simple answer is maybe, but probably not. It really depends what is using the memory.
It is important to consider the structure of the memory. The M9K memory modules are true dual port. This means that they have two independent read/write ports. Each of these ports has one address bus, one read data bus, and one write data bus. What that means is that each memory can host either two independent single port memories, or one dual-port memory.
In order to host two, each one is no larger than half of the M9K, and then by virtue of the fact that you can tie the MSB to 1 on one of the ports, and 0 on the other, you can have one port always accessing the upper half and the other port always accessing the lower half, and thus you have two independent memories (they can even have different clocks as the ports have independent clock inputs).
However, all but two of your memories are dual-port which means that each one has to be hosted in separate M9Ks because each one needs independent control of both ports, so there is no way of them sharing. The upshot is that you can increase the memory used by those blocks if needed for free (up to the size of the M9K).
The key word in the above paragraph is "independent". There is an exception which is that if each memory uses exactly the same data, address, and control signals, then it would be possible to map them in to the same M9K as long as there is space. However this would be an optimisation you would have to do manually, and is highly unlikely to be something that is possible. Usually those little memories come from things like pipeline stages which each need independent control.
What are the options? Well firstly you accept that it is par for the course. Each FPGA has a finite number of "blocks" and your design requires a certain number of "blocks". Whether or not you completely fill each block is irrelevant - this is one of the main differences between FPGA design and ASIC design, wastage of resources is pretty much always going to happen as it is the penalty for having so much configurability.
Alternatively, you try and track down what exactly is using the memory and see if it is possible to use other memory - such as MLABs or LCs. These are distributed memory resources which have much lower capacity for the area - they are fine for very small RAMs like the 64byte ones you have for example. However they have different capabilities and it may be that the RAMs being inferred require the pipelining and latency or port configuration that is not supported in MLABs, in which case see the paragraph above.
Best Answer
OK, i have just spoke to the local support.
So no problem in just using internal flash. The CFM should be enabled while creating the flash IP core- by default it is hidden. This is also the place to look up the base address to start writing the image.
The image is .rbf file generated by Quartus (probably out of .sof, i haven't tried it yet).
Last- to start running the image the FPGA has to do external reset. This is a little upsetting because i haven't prepared it in hardware, but i think i saw something in documents hinting that it's somehow also available from inside the FPGA. Will update if i find it.