If you are about equally at home with an FPGA or a microcontroller for development purposes, a microcontroller will cost less in small quantities.
The lowest cost FPGA listed at DIGIKEY in 1,000 quantity is $US2.85/1000.
Digikey have a microcontroller part which would do this task for about $US0.30. This is an "outlier" and you would usually expect to pay $US0.50 - $1.00 in 1000 quantity. For $US0.68/1000 you get this ST offering - STM8S103xx with UART, SPI, IIC, 5 x 10 bit ADC, multiple timers and 28 I/O lines.
When it comes to doing "bits and pieces" around your core design a microcontroller is going to be substantially more flexible and take less time, unless you are an FPGA guru.
Assembly wise both FPGA and microcontroller are similar. In most cases you will need external driver ICs or discrete components to drive higher current loads - which probably includes your 7 segment display if it is LED (and not if it is LCD).
It does not sound like your application is really all that compute intensive. A dsPIC, for example, can execute 400 k instruction for each one of your iterations. That's a lot. It will be useful, however, to have good low level I/O capability, PWM generators, timers, and the like.
Sine and cosine is really not that hard to do in a integer machine like a dsPIC. I have done it a few time myself. The trick is to pick the right representation for angles. Radians may be nice from a theoretical point of view, but is inconvenient computationally. Degress are artificial and just silly. Use the full range of whatever your machine-sized integer is to represent one full rotation. For example, on a dsPIC, which is a 16 bit processor, one full rotation is 65536 counts, which is way more accuracy and resolution than you need to control a robot or that you can measure anyway.
One advantage of this representation is that all the wrapping happens automatically just due to how unsigned integer adds and subtracts work. Another significant advantage is that this representation lends itself particularly well to using lookup tables for sine and cosine. You only need to store 1/4 cycle. The top two bits of the angle tell you which quadrant you are in, which tells you whether to index into the table forwards or backwards, and whether to negate the result or not. The next N lower bits are used to index into the table, with the table having 2N segments (2N+1 points). Note that indexing into the table backwards is then just complementing the table index bits.
You can give the table enough points so that picking the nearest answer is good enough. For example, if the table has 1024 segments, then sine and cosine will be computed to the nearest 1/4096 of a circle. That's going to be plenty for controlling a robot. If you want more accuracy, you can either make the table bigger or use the remaining lower bits of the angle to linearly interpolate between adjacent table entries.
Anyway, the point is it seems your requirements for this processor don't match up with the stated problem. I'd probably use a dsPIC33F. It is certainly small, light weight, and much more power efficient than a full blown general purpose computing process like a x86 on a single board computer.
Best Answer
I do both hardware and firmware, and I think this project is a much better fit for a microcontroller than an FPGA, unless you're way more comfortable with logic design than coding C. As you said, running under Linux you can use multiple threads.
I believe the BeagleBone Black is probably the best platform for this project. It has way more I/O pins available than the Raspberry Pi. Forget the Parallella, since they've stopped taking pre-orders.
Although you didn't mention it in your post, I see you added a tag for Arduino. Don't even think of trying to use one of those for this project. The cats would win.