Emotional (pleasant or unpleasant) and neutral scenes were presented foveally (at fixation) or peripherally (5.2° away from fixation) as primes for 150 ms. The prime was followed by a mask and a centrally presented probe scene for recognition. The probe was either identical in specific content (i.e., same people and objects) to the prime, or it was related to the prime in general content and affective valence. The probe was always different from the prime in color, size, and spatial orientation. Results showed an interaction between prime location and emotional valence for the recognition hit rate, but also for the false alarm rate and correct rejection times. There were no differences as a function of emotional valence in the foveal display condition. In contrast, in the peripheral display condition both hit and false alarm rates were higher and correct rejection times were longer for emotional than for neutral scenes. It is concluded that emotional gist, or a coarse affective impression, is extracted from emotional scenes in peripheral vision, which then leads to confuse them with others of related affective valence. The underlying neurophysiological mechanisms are discussed. An alternative explanation based on the physical characteristics of the scene images was ruled out.