Published online by Cambridge University Press: 16 November 2021
Most conventional simultaneous localization and mapping (SLAM) approaches assume the working environment to be static. In a highly dynamic environment, this assumption divulges the impediments of a SLAM algorithm that lack modules that distinctively attend to dynamic objects despite the inclusion of optimization techniques. This work exploits such environments and reduces the effects of dynamic objects in a SLAM algorithm by separating features belonging to dynamic objects and static background using a generated binary mask image. While the features belonging to the static region are used for performing SLAM, the features belonging to non-static segments are reused instead of being eliminated. The approach employs deep neural network or DNN-based object detection module to obtain bounding boxes and then generates a lower resolution binary mask image using depth-first search algorithm over the detected semantics, characterizing the segmentation of the foreground from the static background. In addition, the features belonging to dynamic objects are tracked into consecutive frames to obtain better masking consistency. The proposed approach is tested on both publicly available dataset as well as self-collected dataset, which includes both indoor and outdoor environments. The experimental results show that the removal of features belonging to dynamic objects for a SLAM algorithm can significantly improve the overall output in a dynamic scene.