Listeners can use the way people speak (prosody) or what people say (semantics) to infer vocal emotions. It can be speculated that bilinguals and musicians can better use the former rather than the latter compared to monolinguals and non-musicians. However, the literature to date has offered mixed evidence for this prosodic bias. Bilinguals and musicians are also arguably known for their ability to ignore distractors and can outperform monolinguals and non-musicians when prosodic and semantic cues conflict. In two online experiments, 1041 young adults listened to sentences with either matching or mismatching semantic and prosodic cues to emotions. 526 participants were asked to identify the emotion using the prosody and 515 using the semantics. In both experiments, performance suffered when cues conflicted, and in such conflicts, musicians outperformed non-musicians among bilinguals, but not among monolinguals. This finding supports an increased ability of bilingual musicians to inhibit irrelevant information in speech.