This paper analyzes the higher-order approximation of instrumental variable (IV) estimators in a linear homoskedastic IV regression model when a large set of instruments with potential invalidity is present. We establish theoretical results on the higher-order mean-squared error (MSE) approximation of the two-stage least-squares (2SLS), the limited information maximum likelihood (LIML), the Fuller (FULL), the bias-adjusted 2SLS, and jackknife version of the LIML and FULL estimators by allowing for local violations of the instrument exogeneity conditions. Based on the approximation to the higher-order MSE, we consider the instrument selection criteria that can be used to choose among the set of available instruments. We demonstrate the asymptotic optimality of the instrument selection procedure proposed by Donald and Newey (2001, Econometrica 69, 1161–1191) in the presence of locally (faster than
$N^{-1/2}$) invalid instruments in the sense that the dominant term in the MSE with the chosen instrument is asymptotically equivalent to the infeasible optimum. Furthermore, we propose instrument selection procedures to choose instruments among the sets of conservative (known) valid instruments and potentially locally (
$N^{-1/2}$) invalid instruments based on the higher-order MSE of the IV estimators by considering the bias-variance trade-off.