Ouch, well i've never worked with firewire at this level before but here are some maybe helpful thoughts:
I read through the TI PHY datasheet that you linked, it looks like its designed to work with the TI Link Layer Controllers. Have you considered just using such a part rather than trying to implement its functionality from scratch?
Also the PHY <-> LLC link runs at full firewire speed, which for 400mbps and an 8bit link is ~50Mhz, the camera may let you get away with running the link much slower, maybe not. Point being you can't just take the pins of that PHY, blue wire them into the FPGA pins and expect a remotely functional or stable connection. You'll very likely have major signal integrity issues.
It looks like the LLC's from TI come in various configurations, some with rather simple, generic 8/16 bit microcontroller style interfaces which should be easy to implement. You can probably find verilog blocks for such an interface freely available. These links would still need to be fast, ~50Mhz so you'll still have the signal integrity issue. You could run it slower i guess but if the data feed overruns the FIFO in the LLC your SOL.
Once your done with the physical interface you still have an entire firewire driver stack to implement, i guess you'd have to do this in the DSP? Or put a small soft core into the FPGA to handle this work?
What I would really do is tell them they are crazy for forcing a firewire interface for something of this size/capabilities. Its going to take a significant amount of resources to build the firewire interface for no gain since you'll never use anywhere near its bandwidth.
If that fails, I'd try something like this which is a single part with the firewire PHY.LLC and an ARM7 core in a single chip. It offers a parallel data bus to get the information into the FPGA. This way you write the firewire driver to support communications to the camera and plunk it in the ARM7 core and all that has to get transfered to the FPGA are the raw images, no overhead work in the FPGA. You still need to carefully design a PCB for this, your still dealing with a very high speed firewire bus.
EDIT:
At 100MBit/s the firewire bus runs at 100MHZ so you have to deal with moving 100MHZ differential signals from the PHY to the firewire connector. On the PHY<->LLC<->FPGA side: I wouldn't personally try to breadboard a 13MHZ parallel data bus, it may be possible if your careful.
The critical issue for signal integrity is the rise/fall time of the signal, not its clock rate. High clock rate usually means faster rise/fall times but sometimes if you use a transceiver thats designed to run at high frequency at lower frequencies it doesn't actually slow down the rise/fall times.
If the wire carrying the signal is longer than:
Tr/(2*Td) with
Tr = the signal rise time at the source and
Td = the propagation delay per unit length of the wire/cable you are using.
Then you need to consider transmission line effects. You'll have to deal with reflections in the wire which will cause all sorts of junk on the line.
You also need to be careful to make sure all the wires of a parallel bus are the same length with the tolerance for variation depending on the clock frequency of the bus.
Is this thing really going to end up in an UAV/RC aircraft? If so you've got to deal with vibrations and G forces as well.
If external signals are coming with higher frequency then clock
frequency (or if processing of a signal takes multiple clock cycles),
what are generic approaches to process such signals?
One approach is to use a faster clock. Most FPGAs (including your Spartan 6) have dedicated logic for generating different clocks from a source clock. Xilinx has called them DCM (Digital Clock Managers) in the past, although I think Spartan 6 has an actual PLL and DCM-like blocks (may be called something else these days).
These are nice. You can put a 12 MHz crystal on your PCB to minimize emissions, and then inside the FPGA you can multiply the clock up to e.g. 96 MHz (a multiplication of 8x)
You could also try clocking logic on the falling edge and the rising edge. This would give you the effective appearance of a 2x clock (also called DDR), but your logic gets complicated because you must split the design between falling-edge and rising-edge logic.
DCMs can also generate clocks that are 90, 180, or 270 degrees out-of-phase with the incoming clock. So if you really needed to sample a signal faster than the FPGA can handle, you could use the 90 degree clock and do the same DDR trick on it. This would give you four chances to sample an incoming signal with the fastest possible FPGA clock, but again this gets complicated because now you're negotiating logic between two sets of rising and falling edges.
.
Should the clock always be involved? For example, can I achieve the
same thing without using clock at all?
You could give pure combinatorial logic a shot. But in general it represents a timing nightmare. FPGAs have a sea of flip flops and they're very important for achieving timing closure. It really is best practice to clock everything that you can afford to clock. FPGA IOBs (the input output buffers, which send or receive signals to the outside world) have integrated flip flops for the express purpose of making timing closure easier to achieve.
I'm not even really sure how you'd do this without clocking.
.
Is this what is called "crossing a clock domain"?
Almost. Crossing a clock domain, as mentioned above, happens when the output and input do not share a common clock. The reason this is bad is that those clocks are typically asynchronous. As an engineer, you should always consider that asynchronous I/O will go out of its way to change at the worst possible time.
Button inputs are asynchronous, so they suffer from the same problems as crossing clock domains. The problem with an async signal is that flip flops have parameters known as setup and hold times. Setup time is how long the signal must be steady before the clock edge. Hold time is how long the signal must be steady after the clock edge. If you have a setup or hold violation, the flip flop can go meta-stable. This means that you have no idea what the output will be. At the next clock edge, it generally fixes itself. (flip flops can also go meta-stable if they have a voltage between Vih and Vil applied during the clock edge, but that's not a problem in the FPGA) I believe I also once read somewhere that hold time in FPGAs is typically considered to be 0, and the primary reason you don't get timing closure is insufficient setup time.
To handle async inputs, we use synchronizers. Essentially we put multiple flip flops between the async input and the logic it's connected to. If the first flip flop goes meta stable, it won't screw up the second flip flop...hopefully. It still can, though, but the mean time between failures could be years. If that's not enough, you put even more flip flops to reduce the MTBF to acceptable levels.
Now, in terms of "crossing clock domains", you only need the synchronizer if your clocks are async to each other. One of the benefits of the DCM is that it can generate multiple clock frequencies that are perfectly synchronized. So if you have a "big" calculation that takes a long time to complete, you can use a 12 MHz clock, while the 96 MHz clock samples the inputs. Since they both came from the DCM, every 12 MHz edge perfectly lines up with a 96 MHz edge, so there's no need to use a synchronizer.
EDIT: response to comment
So clocks cannot pass my flip-flop asynchronously, obviously because
there can be one signal at a time only (pure physics). So the
synchronization is needed to handle one clock "ticking in between" of
the other clock cycles. If that is correct, what happens if my logic
(combined gate delay) is greater than input clock? Do I get silently screwed, does the signal just "disappears" on its way through the gates, or input clock tick is getting lost?
I'm going to be a bit picky about the language here because it's important to make distinctions sometimes. The clocks are async to each other, but every flip flop is sync to its own clock. The async signal is the output of the clock in one domain and the input of a clock in the other domain.
Say you have two crystals on a board at 48 MHz. They are not both precisely 48 MHz. So if you clock two series flip flops with the two crystals and an inverter at the end to make them switch every cycle, you will eventually cause the first flip flop to output a signal that violates the setup time of the second flip flop's input relative to the second flip flop's clock; but as far as the first flip flop's clock, everything is going according to plan. The second clock will glitch for one cycle; it may go high, it may go low, it may go somewhere inbetween, it might start going one direction and then abruptly change to another halfway through the clock cycle, etc. After the next second clock tick, it will probably correctly sample its input and update its output...probably. Which is why for really high reliability they put more flip flops in the synchronizer, because then you have probably*probably*probably = all but certain.
That's somewhat contrived for an example, but it's easy to see how those clocks would have a periodic meta-stable incident that's a function of the difference in the crystal speeds. For things like buttons, it happens basically totally randomly. This is also why some protocols carry their own clock somehow (e.g. RS232, USB has implicit clock in the data; I2C, SPI have explicit clock trace), to avoid crossing clock domains.
.
When your design is synthesized/mapped/placed/routed, the tools will tell you the maximum propagation delay for any logic that touches a flip flop in your design. If you go higher than this frequency, you broke the rules and the FPGA isn't responsible. In fact, you generally want to go a little slower. So what you do is tell the design tools (for Xilinx this uses a constraint file) "When you're doing place and route, try to keep everything at most 19ns apart," (~52 MHz), then you actually use a 48 MHz crystal (20.8 ns), giving you 1.8 ns of slack (about 9%).
Getting your design to route at the desired speed is called "achieving timing closure". For more complex designs, you have to consider where to put extra flip flops to help pipeline the process and keep the clock speed up. The Pentium 4 is an extreme example; pipeline stages 5 and 20 existed for the sole purpose of allowing signals to propagate across the chip, allowing them to make the clock speed very high. Since the FPGA has a DCM, if you need to clock a very large piece of logic in one stage (e.g. a very large decoder or mux), you could have the DCM do a "slow logic" clock to handle that part, and then have a "fast logic" clock for the other parts. This also kinda helps with power consumption as well, for any stage that's running with the slow logic clock.
For pure combinatorial logic, the tools will tell you the maximum pad-to-pad delay; that is, how much time a signal takes to get from the input pad, through all the logic, and back to the output pad. The problem with such logic is that it has no concept of the past. It cannot remember whether the light was on or off. You could probably design your own feedback elements (e.g. SR latch), perhaps even using the set/resets of the integrated flip flops, but at this point you're starting to become sequential logic, so you might as well just take the plunge and get everything clocked reliably.
Best Answer
Synthesis is highly dependent on the platform you're using and usually needs to be done by tools created by Altera, Xilinx, etc. Nothing open source exists (AFAIK) because this is so custom and requires a lot of effort to obtain optimal and correct results. Therefore, there's little incentive to do open source. Also, because of the IP, these companies don't share information about their chip internals, which prevents others from using them without going through the manufacturers.
By the way, Altera and Xilinx (perhaps others) provide free versions of their tools with some features missing you can use (which is another reason no one seems to do anything open source). They're good enough for many projects.
So, to summarize, do you think anyone would spend time, for no money, to create something that is difficult, with little information, when the manufacturer already provides some of it for free? Take a look at the open source BIOS for PCs. Hasn't gone very far for these same reasons.