I don't know which Xilinx device you're targeting, but here's an overview page of the Xilinx DSP blocks:
There's a small section about division in the Virtex 5 XtremeDSP user guide (p.74), for example:
However, your best bet is to create a divider using 'coregen'. Launch 'coregen', create a new project and go to 'Math Functions -> Dividers' (the options appear depending on the target device). Then go through the wizard choosing your preferred setup.
Here's a bit more information on how to use 'coregen': 'coregen' will create a '.v' or'.vhd' instantiation file depending on the language you've chosen. It will run XST to generate a '.ncg' "blackbox" netlist so the implementation process can include it when you run 'ngdbuild'.
Now, coregen will also generate a '.xco' and '.cgp', which are the only files you actually need (in ISE version 12.x) in order to regenerate the core. In command line, do
coregen -p <core>.cgp -b <core>.xco
and you'll get the HDL instantiation and the netlist (and a bunch of other things in the process). Note that 'coregen' will generate its output where the input files are, not where it is invoked from, and there's no switch to indicate an output path!
I'll preface this with the caveat that I'm not that up to date on the interior workings of recent FPGA architectures. So this answer may not be appropos. depending upon whether the FPGA tools support the design flow I will discuss.
It's probably true the total volume of raw gates shipped into the market are probably latch based designs. This is because of the preponderance of microprocessor contributions to the total number of shipping transistors. So yeah, an artificial measure. In total there are relatively few people designing this way, but most processors use a scheme of:
Logic cloud -> latch (+'ve clock) -> logic cloud -> latch (-'ve clock) -> repeat semi ad-infinitum.
Which if you look at it is the canonical format for a master slave FF, but with more logic inserted between the master and the slave.
The vast majority of people, in terms of the total designs (as in number of designs)use single clock domain edge triggered. To quote Dally and Poulton (Digital Systems Engineering) "Edge-triggered timing, however, is rarely used in high-end microprocessors and system designs largely because it results in a minimum cycle time dependant upon clock skew". Use of latches driven by two-phase non-overlapping clocks results in very robust timing that is largely insensitive to skew. This adds in complexities in the design, signals from one clock domain cannot cannot be intermixed.
The other draw back is that it is rarely taught in schools.
If this was a question on high end system digital design. That would be your answer. If this applies to FPGA's - I don't know for sure but I suggest this COULD be the reason.
BTW - I'd suggest that book to anyone who is serious about advanced digital VLSI design.
"Dally, William J., and John W." Poulton. Digital Systems Engineering. Cambridge University Press.
Best Answer
See also FPGA's vs Microcontrollers
High-speed image or video processing is a good example. Or processing 'images' that aren't straightforward optical images, such as radar or laser-based systems.
The key thing to consider is throughput and latency requirements. A microcontroller can service an interrupt (very roughly) once per microsecond. It can only service one interrupt at once. If you need to process it in an elaborate way, that limits how many you can service in a particular time.
With an FPGA, you can generally respond to an input event immediately (well, on the next clock cycle). You can have lots of processing units in parallel. If you know that your filter takes 20 cycles, that's entirely independant of anything else going on.
Highly-parallel integer intensive computation works best on FPGAs, especially if there's complex data dependencies. However, they don't have a lot of onboard memory; you can add some DRAM to the side but at the cost of latency.
You may also want one for the peripherals, or to speak some high-speed digital bus. You can't bit-bang HDMI into or out of a microcontroller. You can't build a PCI card around one.