First dump it to C/C++, instructions here and profile the application on your computer. neural networks can be rather compute intensive so you need to know how much processing power and what type of processing you need (integer? FP? fixed point? SIMD? etc?).
If you need the network to be trainable while running on your embedded device, then you're stuck developing your own system as MatLab exported code doesn't including the training portions, perhaps based on some external neural network libraries. I would still start with profiling what you've got to get an idea of what class of device your looking at.
All that being said, the beagleboard is probably a safe bet unless what your doing is very compute intensive, it would also allow you to use most linux libraries. I used the FANN library for a project maybe 5 years ago and it was good to work with. I don't know if it will compile for ARM out of the box or if some other library has trumped it recently.
I have done something very similar in the past. My aim was to transfer an image (*.bmp) from PC to FPGA (internal BRAM), and send it back to PC after the watermarking process. As previously mentioned, UART is your best bet. Implement a UART in FPGA or use an existing design. For Xilinx, look at this design provided with the Picoblaze.
It is well documented and can be used as standalone in your design. You can also find older versions of this design for older Xilinx FPGAs. I think you can find similar designs for Altera (or vendor independent) easily.
On the MATLAB, you can read and write data to a serial object using these functions. I have had problems with baud rates above the standard 115200 so if you need real-time performance, UART might not be sufficient. Otherwise start with the lowest baud rate and test it for errors and try to achieve maximum.
Though MATLAB will be fast and easy for this application; since it is not free and not everyone might have access to it, another option is to use Python. In my case, I have written a Python script to communicate with the FPGA. It has a nice and simple serial library called pyserial for serial communication. You can use PIL (Python Imaging Library) for image processing and numpy for computation process.
However; if you are only interested in testing the design for functionality, you can just read pixel data from a file to process it in simulation. Output image data to a text file using MATLAB and then read it into a memory array defined in your testbench file. You can simulate your design as if its running on the hardware (assuming the design is synthesizable) and test it. You can output processed data to a file at any stage of the process and read it from MATLAB for comparison. After you make sure the design works perfectly, you can start implementing the communication interface on actual hardware.
Best Answer
So, I've solved this problem: ClearScada offers convenient .NET API. Just load ClearScada.dll (can be found in CS installation dir) as a normal .NET assembly.