3D sensors like LiDARs, radars and Time of Flight (ToF) cameras have gained immense popularity in the last decade and are fast becoming household names and rightly so. Thistechnology that essentially builds three dimensional models of the real world has found its way into consumer electronics (even our most beloved handheld) and is almost singularly ‘driving’ the self-driving revolution, quickly becoming a very real option for industrial applications across security, retail, smart cities, transportation, and logistics

Why are these sensors gaining so much popularity? For this we need to back up a little and understand what the sensing architectures prior to the advent of lidars and radars looked like. Borne out of a desire to automate or augment human based tasks, e.g. monitoring CCTV, production lines, driving etc, the reliance had been solely on cameras. An obvious choice given that cameras were widely available, of high resolution, cheap, easy to use, and well understood by humans because they generally work in the same frame of reference as the human eye (e.g. visible light spectrum, based on luminance and colour, 2D). Whilst this was useful in many use cases as the camera image is easily interpretable by humans however in some ways this immediate similarity did become a limitation especially when the end user is a machine and not a human.

By breaking this link to human based perception, we open up new possibilities for machine based Computer Vision and Perception systems. Since cameras can only produce a 2D image, the depth and (with it) the precise location of any detected object, must be estimated based on detected context and precise knowledge of the geometry of the ground within the field of view. In most practical applications this would yield only imprecise estimates of locations, especially in crowded environments where many objects would only be partially visible due to occlusions. It may also be susceptible to misclassification of e.g. people or objects in posters on a wall or billboard, or even the side of a bus, which to a 2D sensor appear real. If more precise and reliable tracking is needed, multiple images from different perspectives (e.g. stereo vision) can be combined. We have all experienced the capabilities of such technologies in the tracking of players and balls in sports games. Such solutions need very precise knowledge and control of individual sensor locations and time synchronisation between them. This is often impractical to achieve and can more than offset the potential cost advantage of using cameras in the first place.

With 3D sensors, such as Radar and LiDAR, depth can be measured directly, providing much stronger context for perception applications. This direct measurement of distance at high resolution is especially useful for the precise detection and tracking of humans, cars or other objects in complex or changing environments. Such precise tracking can often be achieved with a more limited number of sensors covering greater distances and with an inherent robustness to changing lighting, including full darkness, and weather conditions which might render a camera based solution inoperable.

Another strong benefit of moving towards other sensor modalities is a reduction in Personally Identifiable Information (PII). This can allow the use of e.g. LiDAR sensors in situations where a camera could not be placed due to security or privacy concerns. The image from a typical LiDAR is not able to uniquely identify faces for instance, or read sensitive documents, both of which limit the application of cameras. This is due to the fact that commonly each tracked object detected by a LiDAR reflects no more than a handful of points whereas a camera image needs much higher resolutions to be able to reliably detect an object of interest. Due to the added information of distance in point clouds these few points are still more than enough to precisely detect, classify and track objects of different types.

The rapid evolution of 3D sensors over time, in particular increases in both resolution and frame rate, enable increasingly more advanced perception algorithms to be run. Not only is it possible to detect and classify people and objects, but also to determine key characteristics such as orientation, pose, gait and even, by inference, intent. Separation of discrete objects in a crowded or partially obscured environment also becomes easier as sensor performance improves. This opens up both new application areas and previously difficult operating environments, greatly expanding the number of use cases which can be supported. 

There is no reason to believe that cameras are being displaced as the primary solution. However, it is important to note that many applications and use cases have started moving to distance measuring sensor modalities, such as LiDAR, which can either greatly enhance the capabilities of the solution or replace cameras outright as, all aspects considered, a cheaper, easier to integrate and more reliable option.