Published online by Cambridge University Press: 23 August 2016
Occupancy statistics in ecology and paleontology are biased upward by the fact that we generally do not have solid data on species that exist but are not found. The magnitude of this bias increases as the average occupancy probability decreases and as the number of sites sampled decreases. A maximum-likelihood method is developed to estimate the underlying distribution of occupancy probabilities of all species based only on the sample of observed species with nonzero occupancy. The method is based on determining the probability that the number of occupied sites will take on any specific value for a given occupancy probability, integrated over the entire distribution of occupancy probabilities. If the shape of the underlying distribution is well modeled, the resulting occupancy estimates circumvent the bias inherent in failing to observe some species and the fact that this bias depends on the number of sites. For occupancy data on marine animal genera drawn from the Paleobiology Database, the underlying distribution is reasonably approximated as a right-truncated log-normal, but the methods developed can be extended to any distribution. Examples are presented to illustrate some observations that are robust and others that need to be revised in light of this bias correction. The method is compared to a recently developed, distribution-free approach to the same problem.