This paper tackles the challenge of the necessity of using the sequence of past environment states as the controller’s inputs in a vision-based robot navigation task. In this task, a robot has to follow a given trajectory without falling in pits and missing its balance in uneven terrain, when the only sensory input is the raw image captured by a camera. The robot should distinguish big pits from small holes to decide between avoiding and passing over. In non-Markov processes such as the abovementioned task, the decision is done using past sensory data to ensure admissible performance. Applying images as sensory inputs naturally causes the curse of dimensionality difficulty. On the other hand, using sequences of past images intensifies this difficulty. In this paper, a new framework called recurrent deep learning (RDL) with combination of deep learning (DL) and recurrent neural network is proposed to cope with the above challenge. At first, the proper features are extracted from the raw image using DL. Then, these represented features plus some expert-defined features are used as the inputs of a fully connected recurrent network (as target network) to generate command control of the robot. To evaluate the proposed RDL framework, some experiments are established on WEBOTS and MATLAB co-simulation platform. The simulation results demonstrate the proposed framework outperforms the conventional controller based on DL for the navigation task in the uneven terrains.