Search

Variable ranking and selection with random forest for unbalanced data
Ute Bradter, John D. Altringham, William E. Kunin, Tim J. Thom, Jerome O’Connell, Tim G. Benton
Journal:

Environmental Data Science / Volume 1 / 2022

Published online by Cambridge University Press:

20 December 2022, e30
- Article
- - You have access
  - Open access
- PDF
- HTML
- Export citation
When one or several classes are much less prevalent than another class (unbalanced data), class error rates and variable importances of the machine learning algorithm random forest can be biased, particularly when sample sizes are smaller, imbalance levels higher, and effect sizes of important variables smaller. Using simulated data varying in size, imbalance level, number of true variables, their effect sizes, and the strength of multicollinearity between covariates, we evaluated how eight versions of random forest ranked and selected true variables out of a large number of covariates despite class imbalance. The version that calculated variable importance based on the area under the curve (AUC) was least adversely affected by class imbalance. For the same number of true variables, effect sizes, and multicollinearity between covariates, the AUC variable importance ranked true variables still highly at the lower sample sizes and higher imbalance levels at which the other seven versions no longer achieved high ranks for true variables. Conversely, using the Hellinger distance to split trees or downsampling the majority class already ranked true variables lower and more variably at the larger sample sizes and lower imbalance levels at which the other algorithms still ranked true variables highly. In variable selection, a higher proportion of true variables were identified when covariates were ranked by AUC importances and the proportion increased further when the AUC was used as the criterion in forward variable selection. In three case studies, known species–habitat relationships and their spatial scales were identified despite unbalanced data.

Multi-scale gas transport modelling for the EC FORGE project
A. E. Bond, K. E. Thatcher, S. Norris
Journal:

Mineralogical Magazine / Volume 79 / Issue 6 / November 2015

Published online by Cambridge University Press:

02 January 2018, pp. 1251-1263
- Article
- - You have access
  - Open access
- PDF
- Export citation
The generation and migration of gas within and around proposed radioactive waste disposal facilities is potentially a safety critical process. A safety case for a facility that generates significant quantities of gas (e.g. through metal corrosion or radiolysis) will require demonstration that gas migration around and away from the waste is sufficiently understood and will not breach the safety case for the facility. Models can be used to understand the likely hydraulic evolution of such a disposal facility, but the models need to consider processes over a range of scales. A whole repository may extend over kilometres, with individual disposal cells at the scale of tens of metres and features which provide pathways for gas migration on a centimetre scale. All of these features may be significant from a safety perspective and capturing the impact of all of these features in a single model is a significant challenge.
This paper presents an approach to tackling this multi-scale problem, which allows the whole repository to be modelled in a computationally efficient manner. The approach involves identifying areas within the modelled domain that show very similar behaviour, and representing these areas with sub-models, so that small-scale features are retained, but computational overhead is decreased by using the results in more than one location in the model domain. The approach allowed a model of a whole repository to be run on a single processor core, whilst maintaining the small-scale features of the system. The model results were compared against more conventional upscaling techniques and show the advantage of a more detailed representation of small-scale features. The model results reflect the conceptual understanding of how gas would migrate in a repository.

Multi-scale community organization of the human structural connectome and its relationship with resting-state functional connectivity
RICHARD F. BETZEL, ALESSANDRA GRIFFA, ANDREA AVENA-KOENIGSBERGER, JOAQUÍN GOÑI, JEAN-PHILIPPE THIRAN, PATRIC HAGMANN, OLAF SPORNS
Journal:

Network Science / Volume 1 / Issue 3 / December 2013

Published online by Cambridge University Press:

03 January 2014, pp. 353-373
- Article
- - You have access
  - Open access
- PDF
- Export citation
The human connectome has been widely studied over the past decade. A principal finding is that it can be decomposed into communities of densely interconnected brain regions. Past studies have often used single-scale modularity measures in order to infer the connectome's community structure, possibly overlooking interesting structure at other organizational scales. In this report, we used the partition stability framework, which defines communities in terms of a Markov process (random walk), to infer the connectome's multi-scale community structure. Comparing the community structure to observed resting-state functional connectivity revealed communities across a broad range of scales that were closely related to functional connectivity. This result suggests a mapping between communities in structural networks, models of influence-spreading and diffusion, and brain function. It further suggests that the spread of influence among brain regions may not be limited to a single characteristic scale.

Multiphysic Two-Phase Flow Lattice Boltzmann: Droplets with Realistic Representation of the Interface
Pablo M. Dupuy, María Fernandino, Hugo A. Jakobsen, Hallvard F. Svendsen
Journal:

Communications in Computational Physics / Volume 9 / Issue 5 / May 2011

Published online by Cambridge University Press:

20 August 2015, pp. 1414-1430

Print publication:

May 2011
- Article
- - Get access
    
    Check if you have access via personal or institutional login
    
    Log in Register
- Export citation
Free energy lattice Boltzmann methods are well suited for the simulation of two phase flow problems. The model for the interface is based on well understood physical grounds. In most cases a numerical interface is used instead of the physical one because of lattice resolution limitations. In this paper we present a framework where we can both follow the droplet behavior in a coarse scale and solve the interface in a fine scale simultaneously. We apply the method for the simulation of a droplet using an interface to diameter size ratio of 1 to 280. In a second simulation, a small droplet coalesces with a 42 times larger droplet producing on it only a small capillary wave that propagates and dissipates.

Diagnosing the causes of territory abandonment by the Endangered Egyptian vulture Neophron percnopterus: the importance of traditional pastoralism and regional conservation
Patricia Mateo-Tomás, Pedro P. Olea
Journal:

Oryx / Volume 44 / Issue 3 / July 2010

Published online by Cambridge University Press:

05 May 2010, pp. 424-433

Print publication:

July 2010
- Article
- - You have access
- PDF
- HTML
- Export citation
Identifying threats to declining species and prescribing ways of preventing their extinction are basic challenges for biodiversity conservation. We analysed the causes underlying the loss of territories of the Endangered Egyptian vulture Neophron percnopterus in a key population at the north-western edge of its distribution in Europe by developing multi-scale models that combined factors from nest site to landscape. We used generalized linear models and an information-theoretic approach to identify the optimal combination of scales and resolutions that could explain territorial abandonment. Those models combining nest-site and landscape scales considerably improved prediction ability compared to those considering only one scale. The best combined model had a high predictive ability (96.9% of correctly classified cases). Small cliffs at high altitudes in rugged areas with declining livestock (especially of sheep and goats) increased the likelihood of territory abandonment. Our findings highlight the importance of developing region-specific multi-scale models to determine reliably the factors driving territory loss and of designing effective conservation strategies accordingly. Conservation measures for the studied population should be developed at two spatial scales. At the smaller scale it is necessary to closely control nest sites to avoid direct disturbances. At a larger scale it is essential to implement policies that can support traditional pastoralism.

Search Results

Refine search

Refine search

Actions for selected content:

5 results

Variable ranking and selection with random forest for unbalanced data

Multi-scale gas transport modelling for the EC FORGE project

Multi-scale community organization of the human structural connectome and its relationship with resting-state functional connectivity

Multiphysic Two-Phase Flow Lattice Boltzmann: Droplets with Realistic Representation of the Interface

Diagnosing the causes of territory abandonment by the Endangered Egyptian vulture Neophron percnopterus: the importance of traditional pastoralism and regional conservation

Search Results

Refine search

Refine search

Actions for selected content:

Save Search

5 results

Variable ranking and selection with random forest for unbalanced data

Multi-scale gas transport modelling for the EC FORGE project

Multi-scale community organization of the human structural connectome and its relationship with resting-state functional connectivity

Multiphysic Two-Phase Flow Lattice Boltzmann: Droplets with Realistic Representation of the Interface

Diagnosing the causes of territory abandonment by the Endangered Egyptian vulture Neophron percnopterus: the importance of traditional pastoralism and regional conservation