However, with all of these new systems, it seems as if GPUs are better
than CPUs in every way.
This is a fundamental mis-understanding. Present GPU cores are still limited compared to current top-line CPUs. I think NVIDIA's Fermi architecture is the most powerful GPU currently available. It has only 32-bit registers for integer arithmetic, and less capability for branch prediction and speculative execution then a current commodity Intel processor. Intel i7 chips provide three levels of caching, Fermi cores only have two, and each cache on the Fermi is smaller than the corresponding cache on the i7. Interprocess communication between the GPU cores is fairly limited, and your calculations have to be stuctured to accommodate that limitation (the cores are ganged into blocks, and communication between cores in a block is relatively fast, but communication between blocks is slow).
A significant limitation of current GPUs is that the cores all have to be running the same code. Unlike the cores in your CPU, you can't tell one GPU core to run your email client, and another core to run your web server. You give the GPU the function to invert a matrix, and all the cores run that function on different bits of data.
The processors on the GPU live in an isolated world. They can control the display, but they have no access to the disk, the network, or the keyboard.
Access to the GPU system has substantial overhead costs. The GPU has its own memory, so your calculations will be limited to the amount of memory on the GPU card. Transferring data between the GPU memory and main memory is relatively expensive. Pragmatically this means that there is no benefit in handing a handful of short calculations from the CPU to the GPU, because the setup and teardown costs will swamp the time required to do the calculation.
The bottom line is that GPUs are useful when you have many (as in hundreds or thousands) of copies of a long calculation that can be calculated in parallel. Typical tasks for which this is common are scientific computing, video encoding, and image rendering. For an application like a text editor the only function where a GPU might be useful is in rendering the type on the screen.
The comments by @ratchet, @Sjoerd and @Stephane answer you question.
Your assertion in the question "All I currently know is that doubles are probably the most expensive because they are larger" shows the rules about optimization - are true - "Only for experts" followed by the "Your not an expert yet" .....
Unless you know the minutest details of the underlying hardware AND how the compiler utilizes those, you cannot make any assertions about floating point numbers. You can't even be certain that a floating point operation takes more time than an integer operation.
As a rule, there is enough problems with programmers writing floating point code correctly, that it should be avoided unless needed. If needed, performance is of little concern.
As a rule of thumb - use ints if possible - it's rarely slower then FP operations, and often faster, and more predictable. If you must use FP operations, use doubles. Floats do not have a large enough mantissa for anything but the roughest of calculations (unless extreme care is taken) and a prone to insidious rounding errors.
However, like all rules of thumb, they need to be applied appropriately. If it matters - measure it.
Best Answer
Many times, operations like floating point and memory management are encoded in a way that they can be "trapped". This means that the system can be configured to either use hardware or automatically branch to a software implementation. In the case of software, the implementation can be anything, although most manufacturers supply libraries that follow accepted standards (IEEE-754 in the case of floating point). In many systems, when a floating-point unit or other chip is installed, the instruction execution is automatically deferred to the new chip, so no software reconfiguration is necessary.
As I understand it, the ARM architecture does something very similar to the x86, with floating-point instructions that trap to software emulation if no FPU hardware is found.