As a processor guy and designer of an open-source microprocessor, I can tell you that it will be difficult to find what you are looking for.
Multiplication, subtraction and addition are very doable in hardware but irrational numbers, arbitrary powers and division are difficult if not infeasible to do in digital hardware. It may be easier to do it in analogue electronics if your equations have a tolerance for errors. Otherwise, you will need to come up with some tricks to get what you need.
If performance is not critical, you would be better served by doing it in software.
From this and some of your other questions, I think that you may need to re-define your problem or come up with a better alternative that does not rely so much on complicated math.
It may not be possible to do what you need, in hardware and you may need to resort to software emulation instead.
Well let's look at your VHDL code. Please keep in mind that I'm not trying to be critical or harsh-- I'm just trying to get you up to speed quickly. Your code is currently this:
process (C, ALOAD)
begin
if (ALOAD='1') then
tmp <= "10000001";
elsif (C'event and C='1') then
tmp <= tmp(6 downto 0) & '0';
SO <= tmp(7);
end if;
end process;
But should look like this:
process (C, ALOAD)
begin
if ALOAD='1' then
tmp <= "10000001";
elsif rising_edge(C) then
tmp <= tmp(tmp'high-1 downto tmp'low) & "0"; -- '0' and "0" are equivalent in this context
end if;
end process;
SO <= tmp(tmp'high);
The changes I made are minor, for sure, but important.
The first thing is my use of "rising_edge()". This is a new-ish thing to VHDL and isn't covered in some of the VHDL books. This is considered to be better than using 'event. Likewise, there is a falling_edge().
The next thing is my use of 'high and 'low instead of hard-coding the values. This really doesn't matter for this code, but when you start doing bigger things then these will help you a lot. For example, you could just change the definition of tmp to be bigger and the rest of the code will just automatically adjust (except for the initialization to "10000001").
I should also point out that an async reset or load in FPGAs is discouraged, but is fine in CPLDs.
Also note that I moved the assignment of SO to outside the process. This may or may not be what you intended. The way you have it there is an extra flip-flop going from tmp(7) to SO. Normally, with SPI, this isn't want you want because the SPI Clock could go away at the end of the transfer and you'll never get that last bit out. On the other hand, with the way that I did it you'll start getting that first bit out when ALOAD='1', not at the rising edge of C.
Unfortunately, this doesn't really answer your question on why you're getting bad data into the AVR. There just isn't enough information in your question. Here's the kinds of things I would be looking at, or be concerned about:
Your o-scope pic doesn't show the other spi signals, like CS. CS is critical to understanding where the bits are supposed to go. Also, knowing what the SPI mode is will help with this.
On the o-scope pic, the first clock cycle where SO='1' is NOT the first bit of the SPI transfer. You loaded the shift register 10-20 mS before that, and your clock period is about 20 uS. So you had at least 1 clock cycle before SO='1', and probably more. So there is some weird stuff going on here-- we don't have enough info to understand the behavior.
You're using ALOAD to load the shift register, but normally you'd use CS_N (active low). While CS_N='1' you do an async load of the shift register, while CS_N='0' you shift it out. Using ALOAD like you have here is OK, but probably not what you wanted and doesn't work with what appears to be some strange SPI clock stuff (from the previous point).
So, here's what you should do...
Clean up the VHDL a bit. Repost the updated version. Since your scope only has 2 channels, hook up one channel to CS_N and the other channel to CLK. Trigger on the falling edge of CLK. Capture a waveform showing 2 clocks before the falling edge of CLK and 5 clocks after. Without changing the settings on the scope, remove CLK and put the probe on SO. Capture another image. Do this again for the rising edge of CLK. So, 4 waveform images total.
Do that and we can re-evaluate what might be wrong.
Edit: Updated to reflect the updated question.
I see two issues: First, if the AVR is sampling on the rising clk edge, you should clock your shift register off of the falling edge. As supercat mentioned, this will give you +/- 0.5 clock periods of setup & hold going into your AVR.
And Second: As you mentioned, you're getting 10000010 instead of 10000001. I do not believe that your VHDL code is in error on this one, but it is obviously coming out of the CPLD wrong. If I had to guess, I would guess that the problem is with some signal integrity issues with your CLK. It's hard to tell with your scope, but it looks like you have over a volt of overshoot and undershoot on that signal (and with that would come a lot of ringing). That ringing, if bad enough, could cause the CPLD to "double clock"-- meaning run the shift register twice for a single clock edge. And if _really_bad_ it could cause the CPLD to latch up and literally explode (I've seen it happen).
Here's some experiments to try:
Instead of 10000001, use 10101010 or 01010101. This will help you see which bit is getting double clocked, and if it is always the same bit.
Zoom in on the clock edges with the scope. Make sure that your scope probes are at the CPLD, not the AVR, when you do this. Yes, it makes a HUGE difference.
Assuming that it is over/undershoot and ringing on the clock, the solution is to add proper signal termination on the line. I would start with a 50 ohm series resistor at the AVR. Note: this will slow down the clock edges, but since you are clocking the CPLD on the falling edge you have a lot of time available.
How is the clock ran from AVR to CPLD? A long wire between PCB's? That would be my guess.
Best Answer
I'm a developer and maintainer at 'The PoC Library'. We try to provide such a library composed of packages (collection of new types and functions) and modules. It comes with common fifos, arithmetics, cross-clock components, low-speed-I/O components and a Ethernet/IP/UDP stack (next release).
As @crgrace described, it's quite complicated to design modules, which:
Our library has an internal configuration mechanismn (PoC.config) to distinguish vendors, devices and even device subfamilies to choose the right code or an optimized implementation. It also distinguishes between synthesis and simulation code at some points.
For example
PoC.fifo_cc_got
is a FIFO with an 'common clock' (cc) interface and put/got signals to control the fifo. The fifo is configurable in widths, depths, fill-state bits and implementation type. It's possible to choose a LUT-based RAM or On-Chip-RAM (ocram) implementation type. If this fifo is synthesized with ocram option for Altera, it uses altsyncram; if Xilinx is chosen, it uses a generic BlockRAM description and implements the pointer arithmetic by explicit carrychain instantiation (Xilinx XST does not find the optimal solution, so it's done manually).There are 2 other fifo types with 'dependent clock' (dc) and independent clock (ic) interface. So if it's required to switch from an normal fifo to a cross-clock fifo (PoC.fifo_ic_got), change the entity name and add a clock and reset for the second clock domain, that's all.
I think this proves, it's possible to write common modules, which work on multiple platforms and compile in different tools (Spartan->Virtex, Cyclone -> Stratix; ISE, Vivado, Quartus).
Besides PoC, there are other open source libraries:
The "Discover Free and Open Source Silicon" (FOSSi) projects on GitHub offers a browsable database of all GitHub projects that mainly use vhdl, verilog, systemverilog, or any other important hardware description language (hdl).
See also: