Autonomous robots depend on their perception systems to understand the world around them. These machines often leverage a host of sensors including cameras, lidars, radars, and ultrasonic sensors to create this environmental understanding. Stereo cameras play a big role in providing depth perception to robotic systems. This depth information can be estimated using classical computer vision techniques, like semi-global matching (SGM) or leverage deep neural networks (DNNs). Each individual algorithm may struggle in a particular set of operating conditions. But when multiple depth estimation algorithms are leveraged simultaneously, It is possible that more robust depth information can be calculated. In this talk, we'll cover work at NVIDIA to train the ESS DNN model for determining stereo disparity using both synthetic and real-world data to perform well where SGM may not. We'll also introduce the Bi3D model which is trained on the simplified question of "is X closer than M meters?" rather than "how far away is X?", yielding improvements in both accuracy and speed. As every approach has deficiencies on its own, we'll touch upon how ensembling the responses of ESS and Bi3D, DNNs developed specifically for robotic perception with SGM could lead to robust obstacle detection. Finally, we'll discuss how we've tuned the performance of these models to run on embedded compute for the responsive stopping behavior required in autonomous mobile robots (AMRs).
In this talk, we'll cover work at NVIDIA to train the ESS DNN model for determining stereo disparity using both synthetic and real-world data to perform well where Semi-Global Matching (SGM) may not.