Pose estimation based on images
Research at Captain AI has found a way for computers to understand the orientation of vessels using cameras. This is the type of task that captains frequently estimate using their eyes while sailing. In order to mimic this human perception, we have developed a novel solution that can be used by our autonomous ships. In the maritime industry, the orientation of a vessel is referred to as the heading. This heading information is a critical component in autonomous navigation, and this insight will let our autonomous ships improve its evasion capabilities.
This approach of estimating the position of vehicles has shown promising results in the automotive industry. With our ship detector, similar benefits will be experienced in the maritime sector. The approach uses deep learning to estimate the viewpoint from which the heading can be derived. This makes it possible to determine the pose of other vessels which is critical in object avoidance for autonomous ships.
Viewpoint estimation for ships can be challenging because ships vary in size and shape. Unlike cars, which have a standard form where the front and back are distinct, there is no visual feature that consistently gives away the viewpoint of a ship. Some ships look so similar from both ends
Dataset generation
We generated a synthetic dataset of 73k images from various boat types by leveraging a ship simulator. This simulator lets us pose ships under various conditions, including rain, fog, dusk, dawn, and various sea states. To create this dataset, images were taken procedurally, creating random weather conditions that changed the environment around the boats. We use this generated synthetic dataset to train our viewpoint estimation model, making it more generalized and better suited for estimating angles with more resolution.
Viewpoint Inference System
To be able to infer video but to also smooth the angles of the Viewpoint estimation model, a viewpoint estimation system had to be created to be able to predict the azimuth of the boats but to also track the boats on screen to enable smoothing capabilities through frames. An object detector, a bounding box tracker, and a smoothing filter had to work alongside the viewpoint estimation model to infer video.