Electronic – How does the Infineon REAL3™ depth sensing imager chips measure time of flight for each pixel

image-sensorrange-detector

tl;dr: How do the REAL3™ depth-sensing imager chips measure time of flight of light for each pixel? Is it quadrature, time-to-amplitude, or something else?


The Infineon REAL3™ depth sensing imagers are used to make the Time-of-Flight (ToF) cameras in the upcoming Asus Zenphone as the "depth sensing 3rd rear camera" for Tango implementation. (additional background info)

From the drawing and discussion in the 2 page PDF I believe it uses RF-modulated infrared light for illumination. The modulation signal is also passed directly to the sensor chip serving as a time and/or phase reference.

I'd like to read about how the measurement is actually implemented within each pixel. This is what I've thought so far:

  1. I'm assuming these are CMOS photodiodes rather than CCD pixels, so one possibility would be to do some kind of quadrature detection, and simply report phase. The problem here would be the ambiguity if the round-trip time is greater than one period of the modulation. The sensors have dynamically adjustable modulation frequency, which I suppose could be used for "auto-ranging". Also, the quadrature method would require at least approximately sinusoidal intensity variation of the LED.

  2. Edge-detection (time to amplitude conversion) could be used instead of quadrature. That would would with rectangular wave modulation.

  3. To get rid of super-period time ambiguities, a more complex digital code could be used instead of a square wave.

Considering they've added a microlens array to the imager to improve light collection, the application is probably S/N limited. In the situation of a phone, you're lighting the scene up to 4 meters away with a flashlight running off of your cellphone. To get sub-nanosecond precision with fast update, the battle is noise. It's possible the method with the best inherent S/N performance would be the best choice.

In all of the above, I'm assuming the infrared light is from a single uniform source flood-illuminating the scene, without any grid pattern or structured light.

The highest available modulation frequency is currently 100 MHz. With a period of 10 nanoseconds and the speed of light of about 30cm/ns, that means at the highest frequency objects at a distance of 150cm would be at the ambiguity limit. But dropping to 30 MHz would work for the rated maximum distance of 4 meters.

So far all I've found is the two page PDF linked above, but no data sheet or in-depth discussion.

Can someone explain: how does the Infineon REAL3™ depth sensing imager chips measure time of flight for each pixel? I'm looking for the definitive answer, not idle speculation. However, educated, contemplative speculation is certainly welcome if no answer is forthcoming.

enter image description here

above: screen shot from the 2 page PDF.

Best Answer

This paper, Robust 3D Measurement with PMD Sensors , describes in detail the Infineon\PMD tech. sensors and how they work.

Figure 3 shows the individual 'phase pixels' (they call them smart pixels that measure the phase with two photogates. enter image description here. As the photogates are switched on and off, these collect the light and integrate it. There is a better diagram on wikipedia which also shows the circuit diagram and the simplicity of the sensor pixels, its just a dioded and some charge buckets. enter image description here

It would be difficult to understand how the sensors could sense with any accuracy because sampling the phase directly would be a challenge. And herein lies the 'secret sauce', what you can't sample directly (the TOF information) you estimate with statistical methods. The paper goes into detail on this and also the equation they use for estimation:

$$ d = \frac{c * \varphi}{4\pi f_{mod}} $$

where

$$\varphi = \arctan\frac{A_1-A_3}{A_2-A_4}$$

enter image description here

There is another good presentation here about a different camera that works on the same technology.

Now for the speculation, I don't think they are doing too much beyond 4 phase sampling in their sensors. They may be comparing the depth and phase information between pixels (more than just a 2d smoothing filter) which would give you a little more resolution.