In SLAM (simultaneous localization and mapping), the topological paradigm provides a more natural and compact solution that scales better with the size of the environment. Computer vision has always been regarded as the ideal sensor technology for topological feature extraction and description and several methods have been proposed in the literature, but they are either time-consuming, require plenty of different sensors, or are very sensitive to perceptual aliasing, all of which limit their application scope.
This paper presents a fast-to-compute collection of features extracted from monocular images, and an adaptive matching procedure for location identification in structured indoor environments inspired by the natural language processing field. Although only dominant vertical lines, color histograms, and a reduced number of keypoints are employed in this paper, the matching framework introduced allows for the incorporation of almost any other type of feature. The results of the experiments carried out in a home and an office environment suggest that the proposed method could be used for real-time topological scene recognition even if the environment changes moderately over time. Due to the combination of complementary features, high precision can be achieved within reasonable computation time by using weaker but faster descriptors.