There is only one Spartan 3E Starter Kit Board. The PDF is below.
http://www.xilinx.com/support/documentation/boards_and_kits/ug230.pdf
From the PDF:
"The SMA connector allows an external clock source to drive one of the FPGA’s global clock
inputs. Alternatively, the FPGA can provide a high-performance clock to another board via
the SMA connector. See Chapter 3, “Clock Sources,” for additional information."
NET "CLK_50MHZ" LOC = "C9" | IOSTANDARD = LVCMOS33 ;
NET "CLK_SMA" LOC = "A10" | IOSTANDARD = LVCMOS33 ;
NET "CLK_SMA" LOC = "A10" | IOSTANDARD = LVCMOS33 ;
Figure 3-2: UCF Location Constraints for Clock Sources
The UCF file is how you set a signal to a pin. Is this the info you're missing?
Method 1: Create ROMs in your FPGA design
Because you have the same data in every board, one option is to use block RAMs in the FPGA, configured as ROM. To do this you instantiate a block RAM, but don't connect to the write pins. Use a synthesis directive in your HDL code or UCF file to specify the initial contents of the RAM. Read the Spartan-3 Generation User's Guide (Chapter 4) to see how to instantiate the RAM and how to access the data from the RAM. If you use Xilinx ISE, there is probably also a "wizard" to generate the RAM block and set up the initial contents for you.
Unfortunately, the Spartan-3E you are using has only 350 kbits of block RAM, not 8 Mbits like you require. For this to work then, you'll have to work out a scheme to compress your data to fit in 350 kbits. The details of how to do this depend on what kind of data you have. If your data is especially random, it might not be reasonable to get this much compression.
Method 2: Store data in external memory
You say you have a 128 Mbit parallel flash and a 16 Mbit SPI flash. You will need to read the datasheets for these parts and understand how they work. Then write a state machine into your FPGA that can access these devices. But this is your job as the FPGA designer. Some random strangers on the internet are not going to design your FPGA for you.
To store the data onto the flash initially you have two choices. First would be, if you are building these boards in volume, you can have your board assembly shop pre-program the flash devices before assembling them onto the boards. Typically you give them a data file in some format they request, and they charge you some small extra fee to have the data flashed in before assembly.
Second option: Read the datasheet for the flash device. Write an FPGA design that allows you to send data from some other interface available on your board (Ethernet, USB, SPI, I2C, whatever), and load it into the flash. At manufacturing time, you load this design temporarily into your FPGA and program your flash; then you store a different "run-time" FPGA design into the on-board configuration PROM, that doesn't have the ability to modify the FLASH, and your users won't have the ability to mess up the data.
Best Answer
You probably do want something like the circuit shown by clabacchio.
This is easily rendered in Verilog as
This is, as others mentioned, a linear feedback shift register, or LFSR, and it generates the maximal length pseudo-random bit sequence that can be produced with a 5-bit state machine. The state machine traverses 31 states (\$2^n-1\$, where n is the number of registers) before repeating itself.
Of all the states that can be encoded by 5 registers, only one is not used, which is the all-0's state. The all-0's state is a lock-up state --- if the state machine gets into that state by an error, it will be stuck permanently in the all-0's state, as you can see because 0 ^ 0 = 0. This means you have to be sure (using a synthesis directive in the Verilog or constraints file) that the registers don't initialize to the all-0's state.
If you need the all-0's state not to lock up, you can use an XNOR in place of the XOR gate, and get a sequence that includes the all-0's state and locks up in the all-1's state.
Also be aware that the longest run of 1's produced by this state machine is 5 in a row, and the longest run of 0's is 4 in a row. This can be important if you are using the PRBS to test a system with ac-coupling...longer runs will stress the system more.
In communications testing, longer sequences are more common, mainly to exercise more of the low frequency behavior of the system:
PRBS7
PRBS23
PRBS31
Notice that it is not always the final two registers that are "tapped" to generate the incoming bit of the shift register.
Xilinx app note XAPP052 gives a handy table of connections to be used to generate any size PRBS from 3 to 168 registers.
App note XAPP211 shows how to implement them efficiently in Xilinx devices. Essentially, a single look-up table in a single logic block can be used to implement up to 32 registers worth of shift register (depending on architecture).
LFSRs can also be used to implement a counter efficiently if you don't care about the intervening states, just how long takes to count down to some terminal value.