The optical mouse has a chip that has a lo-res camera in it, with a DSP. The DSP picks off certain high-contrast bits of the image in one frame, and looks at how they've tracked across the field of view in the next image. From this it infers X and Y motion, and reports the position change to the computer through a USB interface. The LED provides illumination to increase contrast on the surface, so there will be more bright spots the DSP can pick off.
There's not just an IR transmitter/receiver pair, there are two of them. Between transmitters and receivers there's a slotted wheel, which, when rotating, causes a pulse train in the receiver. (The light from the transmitter is blocked, can pass, blocked again, and so on.)
The trick is how the two receivers are placed, namely in quadrature.
This means that the pulses of one receiver precede the pulses of the other by a number of degrees (ideally 90°). If the wheel turn the other way the same pulses now lag the others.
Notice that on a rising edge of channel A the B channel is at a high level when turning one way, and low when turning the other way.
edit (about absolute encoders)
I wasn't completely satisfied with my reply to JGord's comment (some inaccuracies), hence this reprise
The system described above is known as an incremental encoder because it detects relative changes, from one position to the next. Over a full rotation the codes are repeated a number of times, so you can't know your absolute position just by looking at the code.
To overcome this there exist absolute encoders. Instead of two channels in quadrature they have a lot more channels creating a unique pattern for each rotation position. A 10 channel encoder can tell \$2^{10}\$ or 1024 different positions apart. Shaft encoders in robots are even more accurate.
The specific pattern is typical of Gray coding.
about Gray coding
Ordinary binary has the disadvantage that code transitions may create erroneous codes during the transition. Take for instance the change from 0111
(7) to 1000
(8). If the leftmost bit is a bit faster than the others you will see for a moment 1111
(15), which is totally off.
Gray code overcomes this by rearranging the codes so that there's only 1 bit changing at a time.
Absolute encoders won't help you to find the absolute mouse position, however, because the wheel rotates several times while you're moving the mouse. The "unique" pattern will repeat every few mm and isn't so unique after all. Besides, it's always possible to move the mouse when the computer is off, or you can lift the mouse and put it down again a bit further. Both actions will go undetected.
Further reading
"Control Shaft Encoders" \$-\$ Circuit Cellar issue 250, May 2011, p.28 ff
Best Answer
Yes, what you suggest is possible. (Congratulations on your cogent thought process).
The mouse navigation compares "features" from one image frame to the next, and calculates how much X-movement and Y-movement has transpired. If no image features can be found, then no movement can be calculated. An out-of-focus image contains no features: a mouse's lens is extremely near-sighted.
A lens of a different focal length can focus further away. For example, a fixed focal-length camera usually extends focus to infinity. Doing this converts a mouse to a camera (See my profile photo; it was done with this modified mouse):
Having a sharp focus very far away likely allows features to be found, and XY movements to be calculated. Still, pointing it at a featureless sky would confuse the internal calculator, and yield no motion-detection.
However, the mouse is now useless as a mouse on a mouse-pad because features very close are out-of-focus.
Notice that the mouse's LED light source has been removed. Daylight is sufficient to create an image. The internal chip can accommodate a fairly wide range of light levels (but not as wide as your eye).
For a mouse on a pad, the LED is needed for illumination, since the mouse itself shields room light from the small space between mouse and pad.
The plastic lens included with this mouse could not be moved close enough to the internal chip to allow infinity-focus. It was replaced with a 4.5mm focal length plano-convex glass lens. The lens quality need not be high since pixel-count is so small.
Lens is mounted in a machined-brass housing that press-fit over the ADNS-2610 chip's cover. The tiny cover hole (to allow light inside) was drilled out to about 3mm dia. The cover has a 5.6mm diameter step over which the brass housing press-fit. The cover can be pried off the chip with care.
Interface to this 8-pin chip is serial I2C. This is an older chip that requires +5V Vdd. Avago data sheet is very well-done.
Have tested its XY scanning ability with infinity focus, but not used it as a mouse this way: you may find that much gesture movement yields small cursor motion on your screen.
More modern chips provide more pixels and pixel download rates are fast enough to allow video frame rates when used as a camera. Getting data sheets for them and getting small quantities for experimenting seems to be a problem.