A short answer would be: yes; a longer answer would be: it is not worth your time.
An FPGA itself can run a completely asynchronous design no problem. The result you get is the problem since timing through any FPGA is not very predictable. The bigger problem is the fact that your timing and resultant design will almost definitely vary between different place and route sessions. You can put in constraints on individual asynchronous paths making sure that they do not take too long, but I'm not quite sure that you can specify a minimum delay.
In the end it means that your design will be unpredictable and potentially completely variable with even a slight design change. You'd have to look through the entire timing report every time you change anything at all just to make sure that it would still work. On the other hand, if the design is synchronous, you just look for a pass or fail at the end of place and route (assuming your constraints are setup properly, which doesn't take long at all).
In practice people aim for completely synchronous designs but if you need to simply buffer or invert a signal, you don't need to go through a flip flop as long as you constrain it properly.
Hope this clears it up a bit.
There are so many things in this question that it is difficult to know where to start.
I am assuming that your FPGA logic is a SPI slave, not a master. If it is a master then you have a whole different set of issues which I'm going to avoid going into right now.
The simple direct answer to your question is that you need to sample an async signal at least two times the frequency of your signal. So if you have a 4 MHz clock then you need to sample it at 8 MHz or higher. Of course, nothing is simple or direct in this case.
You have things a little more difficult because you are not sampling one async signal, you are sampling three (CLK, CS, and MOSI). You also need to keep those three signals time-aligned with each other through the sampling process. And you have to spit out MISO in such a way as to not violate your setup/hold time at the master.
None of this is easy, but having a higher speed clock will make things much easier. How much higher depends on your code, and you didn't post your code. I think that I could write code to do it with an 8x clock, but that is just a guess. Honestly, however, I think this is the wrong approach.
SPI is a super simple interface, and it would be good if you kept it super simple. SPI has its own clock, and if you use it as a clock then everything becomes almost easy. Instead of changing clock domains on the serial SPI interface, change clock domains on the parallel data going in/out of your shift registers. If you look at those signals carefully you might even realize that you don't need to do anything special, or if you do then it's just a flip-flop per signal. Then you don't need to have your main clock be higher than your SPI clock. Your main clock could actually be slower!
I do this on my SPI FPGA/CPLD interfaces and I have no problems running SPI at 30+ MHz, with or without a second clock domain.
Best Answer
Most FPGAs have a PLL clock synthesis block that generates the clock/s you need from some kind of source. That source may be an external crystal plus amp circuitry in the chip, or an external resonator, or an on-chip resonator, or something else, or a combination/choice of multiple options.
The only way to know the real answer for a particular chip, is to read the data sheet of that particular chip. When you work with any piece of electrical component, you should get ahold of, and read, the data sheet for that component. This, btw, is also true for the Atmega328p on the Arduino boards -- if you haven't yet read the Atmega datasheet, then you're probably not yet ready to move on from AVR MCUs to FPGAs ;-)