Large language models (LLMs) such as ChatGPT, Gemini, and Claude are increasingly being used in aid or place of human judgment and decision making. Indeed, academic researchers are increasingly using LLMs as a research tool. In this paper, we examine whether LLMs, like academic researchers, fall prey to a particularly common human error in interpreting statistical results, namely ‘dichotomania’ that results from the dichotomization of statistical results into the categories ‘statistically significant’ and ‘statistically nonsignificant’. We find that ChatGPT, Gemini, and Claude fall prey to dichotomania at the 0.05 and 0.10 thresholds commonly used to declare ‘statistical significance’. In addition, prompt engineering with principles taken from an American Statistical Association Statement on Statistical Significance and P-values intended as a corrective to human errors does not mitigate this and arguably exacerbates it. Further, more recent and larger versions of these models do not necessarily perform better. Finally, these models sometimes provide interpretations that are not only incorrect but also highly erratic.