Traditional bulky and complex control devices such as remote control and ground station cannot meet the requirement of fast and flexible control of unmanned aerial vehicles (UAVs) in complex environments. Therefore, a data glove based on multi-sensor fusion is designed in this paper. In order to achieve the goal of gesture control of UAVs, the method can accurately recognize various gestures and convert them into corresponding UAV control commands. First, the wireless data glove fuses flexible fiber optic sensors and inertial sensors to construct a gesture dataset. Then, the trained neural network model is deployed to the STM32 microcontroller-based data glove for real-time gesture recognition, in which the convolutional neural network-Attention mechanism (CNN-Attention) network is used for static gesture recognition, and the convolutional neural network-bidirectional long and short-term memory (CNN-Bi-LSTM) network is used for dynamic gesture recognition. Finally, the gestures are converted into control commands and sent to the vehicle terminal to control the UAV. Through the UAV simulation test on the simulation platform, the average recognition accuracy of 32 static gestures reaches 99.7%, and the average recognition accuracy of 13 dynamic gestures reaches 99.9%, which indicates that the system’s gesture recognition effect is perfect. The task test in the scene constructed in the real environment shows that the UAV can respond to the gestures quickly, and the method proposed in this paper can realize the real-time stable control of the UAV on the terminal side.