To be honest the line between the two is almost gone nowadays and there are processors that can be classified as both (AD Blackfin for instance).
Generally speaking:
Microcontrollers are integer math processors with an interrupt sub system. Some may have hardware multiplication units, some don't, etc. Point is they are designed for simple math, and mostly to control other devices.
DSPs are processors optimized for streaming signal processing. They often have special instructions that speed common tasks such as multiply-accumulate in a single instruction. They also often have other vector or SIMD instructions. Historically they weren't interrupt based systems and operated with non-standard memory systems optimized for their purpose making them more difficult to program. They were usually designed to operate in one big loop processing a data stream. DSP's can be designed as integer, fixed point or floating point processors.
Historically if you wanted to process audio streams, video streams, do fast motor control, anything that required processing a stream of data at high speed you would look to a DSP.
If you wanted to control some buttons, measure a temperature, run a character LCD, control other ICs which are processing things, you'd use a microcontroller.
Today, you mostly find general purpose microcontroller type processors with either built in DSP-like instructions or with on chip co-processors to deal with streaming data or other DSP operations. You don't see pure DSP's used much anymore except in specific industries.
The processor market is much broader and more blurry than it used to be. For instance i hardly consider a ARM cortex-A8 SoC a micro-controller but it probably fits the standard definition, especially in a PoP package.
EDIT: Figured i'd add a bit to explain when/where i've used DSPs even in the days of application processors.
A recent product i designed was doing audio processing with X channels of input and X channels of output per 'zone'. The intended use for the product meant that it would often times sit there doing its thing, processing the audio channels for years without anyone touching it. The audio processing consisted of various acoustical filters and functions. The system also was "hot plugable" with the ability to add some number of independent 'zones' all in one box. It was a total of 3 PCB designs (mainboard, a backplane and a plug in module) and the backplane supported 4 plug in modules. Quite a fun project as i was doing it solo, i got to do the system design, schematic, PCB layout and firmware.
Now i could have done the entire thing with an single bulky ARM core, i only needed about 50MIPS of DSP work on 24bit fixed point numbers per zone. But because i knew this system would operate for an extremely long time and knew it was critical that it never click or pop or anything like that. I chose to implement it with a low power DSP per zone and a single PIC microcontroller that played the system management role. This way even if one of the uC functions crashed, maybe a DDOS attack on its Ethernet port, the DSP would happily just keep chugging away and its likely no one would ever know.
So the microcontroller played the role of running the 2 line character LCD, some buttons, temperature monitoring and fan control (there were also some fairly high power audio amplifiers on each board) and even served an AJAX style web page via ethernet. It also managed the DSPs via a serial connection.
So thats a situation where even in the days where i could have used a single ARM core to do everything, the design dictated a dedicated signal processing IC.
Other areas where i've run into DSPs:
*High End audio - Very high end receivers and concert quality mixing and processing gear
*Radar Processing - I've also used ARM cores for this in low end apps.
*Sonar Processing
*Real time computer vision
For the most part, the low and mid ends of the audio/video/similar space have been taken over by application processors which combine a general purpose CPU with co-proc offload engines for various applications.
In principle this is good candidate for FPGA based design. Regarding your requirements:
ad 1. The FPGA most likely will be more expensive, by how much that depends on the device you choose. At first glance smallest Spartan 3 from Xilinx (XC3S50AN) will be more then enough for this task (~10£ from Farnell). I think you can assume this is upper boundary for the cost (it has 56kB RAM inside, so it is more then you need). You may find cheaper device either from Xilinx offering or their competitors Altera and Lattice.
ad 2. The package is the tough issue, I did not saw FPGA with smaller footprint either. However maybe you can use CPLD device (for sake of argument the CPLDs are small FPGAs) which may be in smaller package (PLCC or QFN). On plus side they will be cheaper (even single $) on negative side most likely will not have RAM inside. With CPLD probably you would need external SRAM chip.
ad 3. FPGAs and CPLD current consumption is highly dependent on the programmed design. However there is good chance that FPGA and especially CPLD design would consume less than your current solution.
ad 4. FPGA do have that kind of memory inside, CPLD most certainly not. This may be solved by external sram chip (or two). For example:
|SRAM 1| <--> |CPLD| <--> |uC|
|SRAM 2| <-->
In such arrangement while the uC is writing to SRAM 1, the CPLD will be displaying data from SRAM 2. The CPLD should be able to handle both task simultaneously.
Of course you can solve this in other ways too:
1) use faster uController (ARM for example)
2) use device with some programmable fabric and uC inside (for example FPSLIC from Atmel, however I have never used such devices and I know very little about those)
Standard disclaimer -> as designs are open problems, with many constrains and possible solutions whatever I wrote above may not be true for your case. I believe it is worth checking those option, though.
Best Answer
Generally "DSP..." means 'more relevant horsepower and/or more relevant hardware at the time the product was introduced.'
Generalised processors tend to catch up with olde specialist devices.
DSPIC is p[robably 10+ years old - Olin will know.
[Items in brackets relate to some DSPIC examples - not exhaustive].
In DSP products expect some mix of:
Expect things like barrel shifters,
wide fast pipelines and fast single cycle execution times,
wide single cycle instructions,
DMA [6 or 8 channels, dual port RAM buffers] large linear memory addressing ranges [4 Mword program, 64 kB data] specialist arithmetic oriented features
Maybe:
specialist peripherals such as motor control,
hardware for several different coms standards [CAN, IIC, UART, IIS, AC97, ...] deeper than usual coms buffers [4 bytes] faster and/or wider than usual ADCs [2 Msps, 10 or 12 bit]
You'll find most of these in the DSPIC family - and increasingly so in gp processor families.
In extreme cases you get user microcoding and more.