To amplify stevenvh's answer, any type of logic, when given an input signal, will take some time to produce an output signal; memory is often very slow compared with other logic. Often, there will be a guarantee that the output signal will become valid within a certain amount of time, but that's it. In particular, it's possible that the signal might change several times within that interval, and there will be no indication, prior to the end of that interval, that the signal has achieved its final "correct" value.
When a typical microcontroller or microprocessor reads a byte (or word, or whatever unit) of memory, it generates an address and, some time later, looks at the value output by the memory and acts upon it. Between the time the controller generates the address and the time it looks at the value from memory, it doesn't care when or whether the output signals from the memory change. On the other hand, if the signal from memory hasn't stabilized to its final value by the time the controller looks at it, the controller will misread the memory as having held whatever value was being output at the moment it looked. Normally the controller would look at the value from memory as soon as it was ready to do something with it, but if the memory's value wouldn't be ready then, that might not work. Consequently, many controllers have an option to wait a bit longer after they're ready to process data from memory, to ensure that the output from memory is actually valid. Note that adding such delay will slow things down (the controller would have been happy to act on the data from memory sooner), but will not affect correctness of operation (unless things are slowed down so much that other timing obligations cannot be met).
Welcome to the world of consumer electronics and manufacturing in volume! Nobody ever said it made sense!
The difference in price has nothing to do with anything technical. It is purely the economics of the market. The SPI Flash is being sold in relatively low quantities and somewhat high profit margins. The SD card is being sold in huge quantities and a very low profit margin.
While on the surface it might seem that the SD card would be more expensive since it has a smaller capacity and less "middlemen", that obviously isn't the case.
Another complication is that you could buy one make/model of SD card today, and then buy the same make/model in 3 month, and you would not be guaranteed to get the exact same thing. In those 3 months the internal design of the SD card could change. For most consumers this would not matter, but for some embedded users this could kill your application. Also, the SD card maker is not going to tell you of these changes. The same is not true of the SPI Flash, where you will most likely get the same thing for years.
You can get SD cards from manufacturers that will guarantee that they sell the same part for years, but it will be much more expensive.
These things are true of many products, not just SPI Flash and SD Cards. Memory (Flash and RAM) is the most obvious one. Another one is the iPad. In many cases it would be cheaper to buy iPads in bulk than to try and manufacture your own-- even in 100,000 unit quantities. You can't underestimate the purchasing power of a large company building millions of units at a time.
There are other factors that I didn't cover. Differences in part types, packages, purchasing channels, etc. But the problem you raise is more complicated than any one single factor can account for. My market/economic explanation is the biggest factor, but not the only one.
Best Answer
In addition to the fact that some flash devices are capable of writing more bits in parallel, another factor affecting speed is the way in which garbage-collection is performed. One of the biggest sources of slowdown on flash drives stems from the fact that most flash devices do not allow 512-byte pages to be erased and rewritten; instead, they require that erase operations operate on much larger areas (e.g. 32KB or more). If a device is asked to rewrite block 23, it will find an empty page, write "I am block 23" along with the new data, then find the old block 23 and mark it as invalid. If the number of empty pages gets too low, the device will check whether there's any erasable block which don't hold any valid pages. If not, it will find one which has very few valid pages, and move each page to a blank page in some other block (invalidating the old ones as it goes along). Once a block has been found which doesn't have any valid pages, that block can be erased, and all its pages added back to the pool of blank ones.
Many schemes can be used to keep track of how pages are mapped and determine which blocks should be recycled when. It's possible to design fairly simple schemes that can be implemented on a small micro with limited RAM, but performance may not be great (e.g. it may have to repeatedly read through the flash to identify blocks for garbage collection, and may place data blocks without regard for whether they're likely to become "obsolete" soon). Conversely, if the controller has a generous amount of RAM available, it may be able to do a better job of identifying which blocks should be garbage-collected when, and may also be able store blocks of data with other blocks that will have similar useful lifetimes.
Incidentally, I consider it unfortunate that solid state drives have not standardized on some sort of file system at the controller level (meaning that rather than asking for block #1951331825, software would ask for blocks 4-8 of file #1934129). A flash drive which knew how information was stored in files could make much better decisions about which data should be placed together than one which simply has to deal with seemingly-independent writes to various sectors, and could also do a more effective job of ensuring data integrity under adverse conditions.