I'm basing my answer completely on the code and documentation of the dvi_decoder module, and assuming it actually works as advertised. This file seems to be a (modified?) copy of the IP in the app notes Video Connectivity Using TMDS I/O in
Spartan-3A FPGAs and/or Implementing a TMDS Video Interface in the
Spartan-6 FPGA. These app notes are chock-full of important details, and I suggest you read them carefully.
As you indicated in the question, I will assume you are treating unencrypted streams, that is non-HDCP streams. I'm fairly certain that the information in the NeTV project can be adapted to decrypt HDCP, but it would involve a non-trivial amount of additional work and be on questionable legal grounds depending om your jurisdiction.
It looks like you will be able to obtain the data you need from the outputs of the dvi_decoder block. The block outputs 24-bit color information using the wires red
, green
and blue
, synced to the pixel clock pclk
. The outputs hsync
and vsync
alert the user to the end of a line/screen respectively. In general, you should be able to do on the fly averaging using these outputs.
You will need some basic logic to translate hsync
, vsync
and the pixel clock into an (X,Y) location. Just instantiate two counters, one for X
and one for Y
. Increment X
at every pixel clock. Reset X
to zero at hsync
. Increment Y
at every hsync
. Reset Y
to zero at every vsync
.
Using red
, green
, blue
, X
and Y
, you can do on the fly averaging. By comparing with X
and Y
, you can determine what box each individual pixel should contribute to, if any. Sum the color values into an accumulation register. To obtain the average value, you need to divide the value in the register by the number of pixels. If you are smart, you will make sure the number of pixels is a power of two. Then you can just wire the MSBs of the register to whatever you want to drive.
Because we want to drive displays while doing the accumulation, we will need to do double buffering. So we will need two registers per box per component. If you are using a 25-led string, this means you will need 25*3*2=150 registers. That's quite a bit, so you might want to use block ram instead of registers. It all depends on your exact requirements, experiment!
I assume you will be driving a led string like the one used in the original adafruit project kit. You should be able to figure out how to drive it from the values in the registers quite easily using SPI.
The dvi_decoder module is a fairly complex piece of kit. I suggest you study the app notes in detail.
As an aside, if you have not yet purchased an NeTV for use in this project, I recommend you also have a look at Digilent's Atlys board. With two HDMI inputs and two HDMI outputs, it appears to be tailor made for projects of this kind.
You can create as many clocks as you want, and you can use PLLs or DCMs to create arbitrary clocks. The question is whether you need to, or if you should be doing it a different way.
I find that I end up running as much logic at a common or "core" clock frequency, say the 54MHz that you are using, but I need to trigger certain processes to run periodically. Say a 100ms debounce, a 10kHz PWM update, a 1s timer tick for wall clock, you get the idea. Instead of generating these clocks, I instead run everything at the core clock frequency and generate arbitrary clock enable signals.
You generally don't want to create divided clocks for several reasons. Logic-generated clocks are jittery, the tools may end up routing these "clock" signals along routing paths intended for logic (since they're generated from logic) and as mentioned above and by others, PLLs and DCMs are much better options if you really need to generate a different clock.
Clock gating is what you want. The device primitives have an additional clock enable signal which "gates" the clock signal, allowing to propagate into the primitive or not. When the clock enable is negated, the FF doesn't see the clock and effectively holds its state as if the clock pulse never occurred. When the clock enable signal is asserted the FF sees the clock normally and things proceed as expected. Clock enables are designed specifically to control an FF's access to its clock and as such don't have issues with generating runt clocks. They also don't take up any additional resources, so use them.
e.g. generating a clock in logic. This is bad, don't do this:
process gen_100ms_clk (clk, rst)
variable ctr: integer range 0 to 5399999;
begin
if rst = '1' then
ctr := 0;
out <= '0';
elsif rising_edge(clk) then
if ctr = ctr'high then
out <= not out;
ctr := 0;
else
ctr := ctr + 1;
end if;
end if;
end process gen_100ms_clk;
This code has the out
signal toggle state every 100ms; This signal would be a poor choice to use as the clock signal of a new process, such as here:
process do_100ms(out, rst)
begin
if rising_edge(out) then
...
end if;
end process do_100ms;
This is bad because the FFs in the do_100ms()
process are using a signal created through the logic in the gen_100ms_clk()
process.
Instead, use a clock enable, as shown here:
process gen_100ms_ce (clk, rst)
variable ctr: integer range 0 to 5399999;
begin
if rst = '1' then
ctr := 0;
out <= '0';
elsif rising_edge(clk) then
if ctr = ctr'high then
out <= '1';
ctr := 0;
else
out <= '0';
ctr := ctr + 1;
end if;
end if;
end process gen_100ms_clk;
Now gen_100ms_ce()
creates an out
signal that is high for 1T every 100ms. This is a great way to signal to your code that it's time to do something:
process do_100ms(clk, rst)
begin
if rising_edge(clk) then
if out = '1' then
...
end if;
end if;
end process do_100ms;
Now your do_100ms()
process is running at the same 54MHz clock as everything else and it uses a proper clock enable to trigger whatever you want to happen every 100ms.
Take a look at the RTL output of your toolset; you'll see that the primitive used in your do_100ms()
process will use its clock enable signal.
This method also achieves power savings since there will be large swaths of logic that stay "static" for long amounts of time even though the global clock net is wiggling away at 54MHz in your case. Once every 100ms in my example above, all the clocks which are gated with the 100ms enable become active for 1T and then are static again for another 99.9999815ms. :-) CMOS consumes very little power when it's not changing state, so the only power consumption in the logic with the gated-off clock is in the leakage currents of its logic.
You can extend this into a full-out means of power management. You create clock enables for all the subsystems and your power manager negates the clock enable for whichever subsections you dont' want powered.
Best Answer
You can get better oscillators: reasonably cheap temperature-compensated crystal oscillators (TCXO) can reach 1 PPM.
Some TCXO can be steered with a control voltage (a VCTCXO). You can count the clock cycles of your TCXO with the FPGA and discipline it to a GPS 1PPS signal or PTP. You will have to figure out a good feedback law between the count minus the PTP value and the control voltage. The response time of the loop must be picked wisely.
There is an open-source design from CERN using that method. Checkout page 2 on the schematics