Maritime engineering relies on model forecasts for many different processes, including meteorological and oceanographic forcings, structural responses, and energy demands. Understanding the performance and evaluation of such forecasting models is crucial in instilling reliability in maritime operations. Evaluation metrics that assess the point accuracy of the forecast (such as root-mean-squared error) are commonplace, but with the increased uptake of probabilistic forecasting methods such evaluation metrics may not consider the full forecasting distribution. The statistical theory of proper scoring rules provides a framework in which to score and compare competing probabilistic forecasts, but it is seldom appealed to in applications. This translational paper presents the underlying theory and principles of proper scoring rules, develops a simple panel of rules that may be used to robustly evaluate the performance of competing probabilistic forecasts, and demonstrates this with an application to forecasting surface winds at an asset on Australia’s North–West Shelf. Where appropriate, we relate the statistical theory to common requirements by maritime engineering industry. The case study is from a body of work that was undertaken to quantify the value resulting from an operational forecasting product and is a clear demonstration of the downstream impacts that statistical and data science methods can have in maritime engineering operations.