Well let's look at your VHDL code. Please keep in mind that I'm not trying to be critical or harsh-- I'm just trying to get you up to speed quickly. Your code is currently this:
process (C, ALOAD)
begin
if (ALOAD='1') then
tmp <= "10000001";
elsif (C'event and C='1') then
tmp <= tmp(6 downto 0) & '0';
SO <= tmp(7);
end if;
end process;
But should look like this:
process (C, ALOAD)
begin
if ALOAD='1' then
tmp <= "10000001";
elsif rising_edge(C) then
tmp <= tmp(tmp'high-1 downto tmp'low) & "0"; -- '0' and "0" are equivalent in this context
end if;
end process;
SO <= tmp(tmp'high);
The changes I made are minor, for sure, but important.
The first thing is my use of "rising_edge()". This is a new-ish thing to VHDL and isn't covered in some of the VHDL books. This is considered to be better than using 'event. Likewise, there is a falling_edge().
The next thing is my use of 'high and 'low instead of hard-coding the values. This really doesn't matter for this code, but when you start doing bigger things then these will help you a lot. For example, you could just change the definition of tmp to be bigger and the rest of the code will just automatically adjust (except for the initialization to "10000001").
I should also point out that an async reset or load in FPGAs is discouraged, but is fine in CPLDs.
Also note that I moved the assignment of SO to outside the process. This may or may not be what you intended. The way you have it there is an extra flip-flop going from tmp(7) to SO. Normally, with SPI, this isn't want you want because the SPI Clock could go away at the end of the transfer and you'll never get that last bit out. On the other hand, with the way that I did it you'll start getting that first bit out when ALOAD='1', not at the rising edge of C.
Unfortunately, this doesn't really answer your question on why you're getting bad data into the AVR. There just isn't enough information in your question. Here's the kinds of things I would be looking at, or be concerned about:
Your o-scope pic doesn't show the other spi signals, like CS. CS is critical to understanding where the bits are supposed to go. Also, knowing what the SPI mode is will help with this.
On the o-scope pic, the first clock cycle where SO='1' is NOT the first bit of the SPI transfer. You loaded the shift register 10-20 mS before that, and your clock period is about 20 uS. So you had at least 1 clock cycle before SO='1', and probably more. So there is some weird stuff going on here-- we don't have enough info to understand the behavior.
You're using ALOAD to load the shift register, but normally you'd use CS_N (active low). While CS_N='1' you do an async load of the shift register, while CS_N='0' you shift it out. Using ALOAD like you have here is OK, but probably not what you wanted and doesn't work with what appears to be some strange SPI clock stuff (from the previous point).
So, here's what you should do...
Clean up the VHDL a bit. Repost the updated version. Since your scope only has 2 channels, hook up one channel to CS_N and the other channel to CLK. Trigger on the falling edge of CLK. Capture a waveform showing 2 clocks before the falling edge of CLK and 5 clocks after. Without changing the settings on the scope, remove CLK and put the probe on SO. Capture another image. Do this again for the rising edge of CLK. So, 4 waveform images total.
Do that and we can re-evaluate what might be wrong.
Edit: Updated to reflect the updated question.
I see two issues: First, if the AVR is sampling on the rising clk edge, you should clock your shift register off of the falling edge. As supercat mentioned, this will give you +/- 0.5 clock periods of setup & hold going into your AVR.
And Second: As you mentioned, you're getting 10000010 instead of 10000001. I do not believe that your VHDL code is in error on this one, but it is obviously coming out of the CPLD wrong. If I had to guess, I would guess that the problem is with some signal integrity issues with your CLK. It's hard to tell with your scope, but it looks like you have over a volt of overshoot and undershoot on that signal (and with that would come a lot of ringing). That ringing, if bad enough, could cause the CPLD to "double clock"-- meaning run the shift register twice for a single clock edge. And if _really_bad_ it could cause the CPLD to latch up and literally explode (I've seen it happen).
Here's some experiments to try:
Instead of 10000001, use 10101010 or 01010101. This will help you see which bit is getting double clocked, and if it is always the same bit.
Zoom in on the clock edges with the scope. Make sure that your scope probes are at the CPLD, not the AVR, when you do this. Yes, it makes a HUGE difference.
Assuming that it is over/undershoot and ringing on the clock, the solution is to add proper signal termination on the line. I would start with a 50 ohm series resistor at the AVR. Note: this will slow down the clock edges, but since you are clocking the CPLD on the falling edge you have a lot of time available.
How is the clock ran from AVR to CPLD? A long wire between PCB's? That would be my guess.
I don't see a synchronizer on the rx data line.
All asynchronous inputs must be synchronized to the sampling clock. There are a couple of reasons for this: metastability and routing. These are different problems but are inter-related.
It takes time for signals to propagate through the FPGA fabric. The clock network inside the FPGA is designed to compensate for these "travel" delays so that all flip flops within the FPGA see the clock at the exact same moment. The normal routing network does not have this, and instead relies on the rule that all signals must be stable for a little bit of time before the clock changes and remain stable for a little bit of time after the clock changes. These little bits of time are known as the setup and hold times for a given flip flop. The place and route component of the toolchain has a very good understanding of the routing delays for the specific device and makes a basic assumption that a signal does not violate the setup and hold times of the flip flops in the FPGA. With that assumption and knowledge (and a timing constraints file) it can properly place the logic within the FPGA and ensure that all the logic that looks at a given signal sees the same value at every clock tick.
When you have signals that are not synchronized to the sampling clock you can end up in the situation where one flip flop sees the "old" value of a signal since the new value has not had time to propagate over. Now you're in the undesirable situation where logic looking at the same signal sees two different values. This can cause wrong operation, crashed state machines and all kinds of hard to diagnose havoc.
The other reason why you must synchronize all your input signals is something called metastability. There are volumes written on this subject but in a nutshell, digital logic circuitry is at its most basic level an analog circuit. When your clock line rises the state of the input line is captured and if that input is not a stable high or low level at that time, an unknown "in-between" value can be captured by the sampling flip flop.
As you know, FPGAs are digital beasts and do not react well to a signal that is neither high nor low. Worse, if that indeterminate value makes its way past the sampling flip flop and into the FPGA it can cause all kinds of weirdness as larger portions of the logic now see an indeterminate value and try to make sense of it.
The solution is to synchronize the signal. At its most basic level this means you use a chain of flip flops to capture the input. Any metastable level that might have been captured by the first flip flop and managed to make it out gets another chance to be resolved before it hits your complex logic. Two flip flops are usually more than sufficient to synchronize inputs.
A basic synchronizer looks like this:
entity sync_2ff is
port (
async_in : in std_logic;
clk : in std_logic;
rst : in std_logic;
sync_out : out std_logic
);
end;
architecture a of sync_2ff is
begin
signal ff1, ff2: std_logic;
-- It's nice to let the synthesizer know what you're doing. Altera's way of doing it as follows:
ATTRIBUTE altera_attribute : string;
ATTRIBUTE altera_attribute OF ff1 : signal is "-name SYNCHRONIZER_IDENTIFICATION ""FORCED IF ASYNCHRONOUS""";
ATTRIBUTE altera_attribute OF a : architecture is "-name SDC_STATEMENT ""set_false_path -to *|sync_2ff:*|ff1 """;
-- also set the 'preserve' attribute to ff1 and ff2 so the synthesis tool doesn't optimize them away
ATTRIBUTE preserve: boolean;
ATTRIBUTE preserve OF ff1: signal IS true;
ATTRIBUTE preserve OF ff2: signal IS true;
synchronizer: process(clk, rst)
begin
if rst = '1' then
ff1 <= '0';
ff2 <= '0';
else if rising_edge(clk) then
ff1 <= async_in;
ff2 <= ff1;
sync_out <= ff2;
end if;
end process synchronizer;
end sync_2ff;
Connect the physical pin for the N64 controller's rx data line to the async_in input of the synchronizer, and connect the sync_out signal to your UART's rxd input.
Unsynchronized signals can cause weird issues. Make sure any input connected to an FPGA element that isn't synchronized to the clock of the process reading the signal is synchronized. This includes pushbuttons, UART 'rx' and 'cts' signals... anything that is not synchronized to the clock that the FPGA is using to sample the signal.
(An aside: I wrote the page at www.mixdown.ca/n64dev many years ago. I just realized that I broke the link when I last updated the site and will fix it in the morning when I'm back at a computer. I had no idea so many people used that page!)
Best Answer
Not knowing all the details, in principle it is not a bug. This is a process that occurs whenever there is a falling edge on the associated clock. You can imagine that all signal reads happen right before the edge, and all the writes happen at the edge.
So having inside a clocked process:
The order of the above assignments does not matter, and can be represented by the following schematic:
simulate this circuit – Schematic created using CircuitLab
As you can see in the above schematic, each Flip-Flop has a 'queued' value
D
and it becomes the outputQ
at the clock's edge.a
hasb
queued, andc
hasa
queued. So in a single clock,b
can't propagate all the way toc
(it requires 2 clock cycles). So specifying the signal 'queuing' in code does not need any particular order.In your specific example, when
ADC_Counter
is13
,ADC_temp(7)
hasADC_Data
queued, andADC_Reg_1
hasADC_Temp
queued.ADC_Reg_1
is not getting corrupted with the new data bit, in the same way that ourc
is not getting corrupted withb
.Note: This would be different if talking about variables (you'd be using the
:=
operator instead of<=
), but your code only contains signals.