The robustness of in-line raw milk analysis with near-infrared spectroscopy (NIRS) was tested with respect to the prediction of the raw milk contents fat, protein and lactose. Near-infrared (NIR) spectra of raw milk (n = 3119) were acquired on three different farms during the milking process of 354 milkings over a period of six months. Calibration models were calculated for: a random data set of each farm (fully random internal calibration); first two thirds of the visits per farm (internal calibration); whole datasets of two of the three farms (external calibration), and combinations of external and internal datasets. Validation was done either on the remaining data set per farm (internal validation) or on data of the remaining farms (external validation). Excellent calibration results were obtained when fully randomised internal calibration sets were used for milk analysis. In this case, RPD values of around ten, five and three for the prediction of fat, protein and lactose content, respectively, were achieved. Farm internal calibrations achieved much poorer prediction results especially for the prediction of protein and lactose with RPD values of around two and one respectively. The prediction accuracy improved when validation was done on spectra of an external farm, mainly due to the higher sample variation in external calibration sets in terms of feeding diets and individual cow effects. The results showed that further improvements were achieved when additional farm information was added to the calibration set. One of the main requirements towards a robust calibration model is the ability to predict milk constituents in unknown future milk samples. The robustness and quality of prediction increases with increasing variation of, e.g., feeding and cow individual milk composition in the calibration model.