1. Introduction
Radio sky surveys offer a view of the high-energy sky, probing synchrotron, cyclotron, and thermal processes across a range of distances, from planets and exoplanets to high-redshift radio galaxies. At lower frequencies, the fields of view of radio telescopes are larger, enabling large-scale surveys of the radio sky, such as the National Radio Astronomy Observatory (NRAO) Very Large Array (VLA) Sky Survey (NVSS; Condon et al. Reference Condon, Cotton, Greisen, Yin, Perley, Taylor and Broderick1998) at 1.4 GHz, the Sydney University Molonglo Sky Survey (SUMSS; Bock, Large, & Sadler Reference Bock, Large and Sadler1999; Mauch et al. Reference Mauch, Murphy, Buttery, Curran, Hunstead, Piestrzynski, Robertson and Sadler2003), and the Low-frequency Sky Survey Redux at 74 MHz (VLSSr; Lane et al. Reference Lane, Cotton, van Velzen, Clarke, Kassim, Helmboldt, Lazio and Cohen2014). Spurred by the development of the Square Kilometre Array (SKA), new radio telescopes are exploring the radio sky across wider areas and frequency ranges than accessible in the past (Figure 1).
The Murchison Widefield Array (MWA; Tingay et al. Reference Tingay2013), operational since 2013, is a precursor to the low-frequency component of the SKA, which will be the world’s most powerful radio telescope. The GaLactic and Extragalactic All-sky MWA (GLEAM; Wayth et al. Reference Wayth2015) survey observed the whole sky south of declination (Dec) $+30^\circ$ from 2013 to 2015 between 72 and 231 MHz. GLEAM has been processed in a multitude of ways: continuum data releases cover most of the extragalactic sky (GLEAM ExGal; Hurley-Walker et al. Reference Hurley-Walker2017), the Magellanic Clouds (For et al. Reference For2018), the Galactic Plane (GLEAM GP; Hurley-Walker et al. Reference Hurley-Walker2019b), and a deep region over the South Galactic Pole (GLEAM SGP; Franzen et al. Reference Franzen, Hurley-Walker, White, Hancock, Seymour, Kapińska, Staveley-Smith and Wayth2021a); and polarisation products include all-sky circular (Lenc et al. Reference Lenc, Murphy, Lynch, Kaplan and Zhang2018) and linear polarisation surveys (Polarised GLEAM Survey (POGS); Riseley et al. Reference Riseley2018; Riseley et al. Reference Riseley2020). Cross-identifications have been provided for the 1863 brightest radio sources in the mid-infrared (the G4Jy Sample White et al. Reference White2020a; White et al. Reference White2020b), and for 1590 galaxies in the 6dF Galaxy Survey (Franzen et al. Reference Franzen2021b).
While GLEAM had lower sensitivity and resolution than other surveys of the time (e.g. the First Alternative Data Release of the Tata Institute for Fundamental Research Giant Metrewave Radio Telescope Sky Survey: TGSS-ADR1; Intema et al. Reference Intema, Jagannathan, Mooley and Frail2017), its major advancement was in leveraging its low frequency and very large fractional bandwidth. Extremely steep spectral indices ( $\alpha<-2$ , for $S\propto\nu^\alpha$ ) indicate old emission, such as that found in the remnant stage of radio galaxy life cycles (Hurley-Walker et al. Reference Hurley-Walker2015; Duchesne & Johnston-Hollitt Reference Duchesne and Johnston-Hollitt2019) or ‘fossil’ emission in galaxy clusters (Giacintucci et al. Reference Giacintucci, Markevitch, Johnston-Hollitt, Wik, Wang and Clarke2020); rising spectral indices point toward thermal emission such as found in planetary nebulae (Hurley-Walker et al. Reference Hurley-Walker2019b). In this frequency range, absorption effects become important for many sources, allowing measurements to probe synchrotron and free-free absorption in extragalactic radio sources (Callingham et al. Reference Callingham2017) and in Galactic Hii regions (Su et al. Reference Su2017). Additionally, GLEAM’s very high sensitivity to large angular scales, often resolved out by interferometric surveys, enabled exploration of diffuse emission such as Galactic supernova remnants (e.g. Hurley-Walker et al. Reference Hurley-Walker2019c) and in clusters of galaxies (e.g. Zheng et al. Reference Zheng, Johnston-Hollitt, Duchesne and Li2018).
In 2017 the MWA underwent an upgrade to ‘Phase II’, in which an additional 128 tiles were added to the observatory (Wayth et al. Reference Wayth2018). This enabled observing using two different 128-tile configurations: ‘compact’, comprising many redundant baselines to improve calibration toward statistical detection of the Epoch of Reionisation (Joseph, Trott, & Wayth Reference Joseph, Trott and Wayth2018), and ‘extended’, an array optimised for imaging (within the constraints of the observatory) with maximum baselines of 5.5 km, approximately doubling the resolution of the telescope. The latter layout considerably reduces the sidelobes of the synthesised beam, allowing a more ‘natural’ weighting of the visibility data, which thereby improves the sensitivity of the instrument; sidelobe confusion is also reduced. The smaller main lobe of the synthesised beam reduces the classical confusion limit from ${\sim}2$ to ${\sim}0.3\,\mathrm{mJy}$ at 200 MHz (Franzen et al. Reference Franzen, Vernstrom, Jackson, Hurley-Walker, Ekers, Heald, Seymour and White2019). These improvements make it more feasible to integrate for longer times and thereby reach lower noise levels without quickly approaching a confusion floor.
While GLEAM enabled a huge range of science outcomes, better modelling of the foregrounds for searches for the Epoch of Reonisation, and flux density scale calibration of the low-frequency southern sky, it is fundamentally limited by its low ( ${\sim} 2'$ ) resolution and the sensitivity limits of the original configuration of the MWA. We therefore undertook a wide-area survey with the Phase II extended array to create GLEAM-X, a deeper, higher-resolution successor to GLEAM, with the same sky and frequency coverage, observed over 2018–2020. During that time, the Long Baseline Epoch of Reionisation Survey (LoBES; Lynch et al. Reference Lynch2021) has demonstrated the survey capability of Phase II by measuring the spectral behaviour of 80824 sources over 100–230 MHz in $3069\,\mathrm{deg}^2$ , down to a noise limit of ${\sim} 2\,\mathrm{mJy\,beam}^{-1}$ , showing the utility of wide-area surveys with the extended array. New radio southern-sky surveys across 800–1400 MHz using the Australian SKA Pathfinder (ASKAP; Hotan et al. Reference Hotan2021) such as the Rapid ASKAP Continuum survey (RACS; McConnell et al. Reference McConnell2020; Hale et al. Reference Hale2021) have also been developed, offering improved morphological information for millions of radio sources.
Figure 1 shows that the sensitivity of GLEAM-X to ordinary radio galaxies ( $-0.8 \lesssim \alpha \lesssim -0.5$ ) is competitive with other ongoing wide-area surveys such as RACS and the Very Large Array Sky Survey at 3 GHz (VLASS; Lacy et al. Reference Lacy2020). Note also that its sensitivity to steep-spectrum sources ( $\alpha=-2.5$ ) is the same as the upcoming Evolutionary Map of the Universe, which will approach the confusion brightness limit at its frequency (EMU; Norris et al. Reference Norris2011; Reference Norris2021). Covering the northern sky at 6–60′′ resolution, the LOw-Frequency ARray (LOFAR; van Haarlem et al. Reference van Haarlem2013) is observing several ongoing surveys: the LOFAR Two-metre Sky Survey (LoTSS; Shimwell et al. Reference Shimwell2017), the LOFAR Low-Band Array Sky Survey (LoLSS; de Gasperin et al. Reference de Gasperin2021), and the LOFAR Decametre Sky Survey (LoDeSS; van Weeren et al., in preparation).
To reach noise levels that are a significant improvement over GLEAM while still covering a wide area, we accumulate a large ( ${\sim} 2\,\mathrm{PB}$ ) volume of visibility data. Releasing processed data products in stages will be of more use to the community than a single data release in the future. This paper is therefore the first in a series of data releases. We release here a pilot survey area that indicates the qualities that can eventually be expected over the full survey, covering $1447\,\mathrm{deg}^2$ over $4\,\mathrm{h}\leq \mathrm{RA}\leq 13\,\mathrm{h}$ , $-32.7^\circ \leq \mathrm{Dec} \leq -20.7^\circ$ . Polarisation processing and an associated early data release will be described in a companion paper, Zhang et al. (in preparation). Herein we describe the GLEAM-X observations (Section 2), processing pipeline to produce images and mosaics (Section 3), source-finding to generate catalogues (Section 4), and motivate several extensions to the pipeline (Section 5). Section 6 concludes with an outlook on scientific advances enabled by the survey, and plans for further data releases.
All positions given in this paper are in J2000 equatorial coordinates.
2. Observations
GLEAM used a drift scan survey strategy to quickly and efficiently observe the entire sky south of Dec $+30^\circ$ using the Phase I ‘128T’ configuration of the MWA (Wayth et al. Reference Wayth2015). In the first year (2013 August–2014 June) observations were made along the meridian ( $\mathrm{HA}=0\,\mathrm{h}$ ), using seven pointings at Declinations centred on $-72^\circ$ to $18.6^\circ$ . In the second year, further observations at $\mathrm{HA}=\pm1\,\mathrm{h}$ were taken. By combining the GLEAM data in the image plane over the full range of HA for a region around the South Galactic Pole, Franzen et al. (Reference Franzen, Hurley-Walker, White, Hancock, Seymour, Kapińska, Staveley-Smith and Wayth2021a) were able to reach a noise level of $5\,\mathrm{mJy}\,\mathrm{beam}^{-1}$ at 215 MHz, about half that of the extragalactic data release by Hurley-Walker et al. (Reference Hurley-Walker2017), showing that such a strategy was effective.
GLEAM-X therefore adopted a similar strategy, iterating through the same Declination and HAs as GLEAM, but doubling the number of $\mathrm{HA}=0\,\mathrm{h}$ observations, and using the extended configuration of the Phase II MWA. Observations were performed in month-long blocks in order to observe similar ranges in RA across the different Declination and HAs, making it easier to combine many drift scans in large mosaics in simple sky projections, improving the uniformity of sensitivity across the sky.
To cover 72–231 MHz using the 30.72-MHz instantaneous bandwidth of the MWA, five frequency ranges of 72–103 MHz, 103–134 MHz, 139–170 MHz, 170–200 MHz, and 200–231 MHz were cycled through sequentially, changing every two minutes. Gain calibrators were visited on an hourly basis in order to provide a back-up in case of unsuccessful in-field calibration (Section 3.1).
After the first observing run in the 2018-A observing semester,Footnote a the data were triaged to search for poor ionospheric conditions that would hinder high-quality imaging. We determined calibration solutions for the gain calibrator observations on 30-s cadences, and examined the temporal variability between the first and last time-steps for each observation. Seventeen nights were identified as having unacceptably variable gains, with an average of more than $12^\circ$ of phase change between the first and last time-steps of at least one calibrator, a level at which the imaging quality became very poor. These nights were re-observed in the 2019-A semester. In 2020, the COVID-19 pandemic reduced the observing time available in the 2020-A and B semesters in the extended configuration, so at the time of writing, no further observations to replace any other ionospherically disturbed nights have been possible, although further observations have been proposed for 2022. Table A.1 summarises the observations taken over the period 2018–2020, including those nights that were re-observed.
3. Continuum pipeline
The GLEAM-X pipeline is available on GitHubFootnote b in a containerised version that can be run on any platform with Singularity installed (Kurtzer, Sochat, & Bauer Reference Kurtzer, Sochat and Bauer2017).
Some common software packages are used throughout the data reduction. Unless otherwise specified:
To convert radio interferometric visibilities into images, we use the widefield imager WSClean (Offringa et al. Reference Offringa2014) version 2.9, which correctly handles the non-trivial w-terms of MWA snapshot images; versions 2 onward include some useful features such as automatically thresholded cleaning, and multi-scale clean (Offringa & Smirnov Reference Offringa and Smirnov2017);
the primary beam is as defined by Sokolowski et al. (Reference Sokolowski2017); however, for speed, all primary beams are precalculated and then interpolated as required using code which is available on githubFootnote c and archived on Zenodo (Morgan & Galvin Reference Morgan and Galvin2021);
to mosaic together resulting images, we use the mosaicking software swarp (Bertin et al. Reference Bertin, Mellier, Radovich, Missonnier, Didelon and Morin2002); to minimise flux density loss from resampling, images are oversampled by a factor of four when being regridded, before being downsampled back to their original resolution;
to perform source-finding, we use Aegean v2.2.5Footnote d (Hancock et al. Reference Hancock, Murphy, Gaensler, Hopkins and Curran2012; Hancock, Trott, & Hurley-Walker Reference Hancock, Trott and Hurley-Walker2018) and its companion tools such as the Background and Noise Estimator (BANE); this package has been optimised for the wide-field images of the MWA, and includes the ‘priorised’ fitting technique, which is necessary to obtain flux density measurements for sources over a wide bandwidth. Fitting errors calculated by Aegean take into account the correlated image noise, and are derived from the fit covariance matrix, which quantifies the quality of fitting; if the fit is poor, and the residuals are large, the fitting errors on position, shape, flux density etc all increase appropriately, so it produces useful error estimates for further use.
We now discuss the typical steps undertaken by the pipeline to produce a set of continuum images and catalogues.
3.1. Calibration
Calibration is performed separately on each observation in a direction-independent manner. The sky model is mainly derived from GLEAM, with additional measurements from the literature for the brighter and more complex sources (e.g. Virgo A in this release). The sky model is described in a companion paper (Hurley-Walker et al. in prep). MitchCal (Offringa et al. Reference Offringa2016) is used to generate a calibration solution for each observation, using the full time range of two minutes. These calibration solutions consist of a complex gain for all 4 polarisation products (i.e. a Jones matrix) per tile, per (40-kHz) spectral channel. Since the sky model is limited by the resolution of GLEAM, we exclude baselines longer than the maximum baseline of the 128T configuration, i.e., 2.5 km ( $1667\lambda$ at 200 MHz); to avoid contamination from diffuse Galactic emission, we also exclude baselines shorter than 112 m ( $75\lambda$ at 200 MHz). Calibration solutions are inspected for each night, and tiles or receivers are flagged if they show instrumental issues (e.g. phases appear random with respect to frequency). This typically affects between 1 and 8 of 128 available tiles per night. We also examine whether the solutions are stable within an observation: rapidly changing gains indicate that ionospheric conditions will dramatically reduce imaging quality (as in Section 2). Observations in this category are triaged and do not proceed to imaging (Section 3.7). Similarly, the stability of the gains over the night is inspected; in good conditions, the phases of the solutions only change slowly, on the order of $10^\circ$ on timescales of hours. If more than 20% of the solutions for a given observation are flagged, we transfer solutions from a well-calibrated observation at the same frequency that is closest in time.
3.1.1. Removing contamination from sidelobe sources
The very brightest radio sources in the sky, the so-called ‘A-team’ sources (Table 2 in Hurley-Walker et al. Reference Hurley-Walker2017), can cause significant image artefacts if they are just outside the field-of-view or in a sidelobe of the primary beam. Additionally, if they are located inside the field-of-view, the standard deconvolution process (Section 3.2) is not always optimal. To remove these sources from the affected observations, we perform a (u,v) subtraction method. The visibilities are phase-rotated to the location of the source, and a $20'\times20'$ image of the region is formed, using the following WSClean settings:
imaging the XX and YY instrumental polarisation products;
each polarisation product is imaged across 64 480-kHz wide channels that are jointly cleaned using the -join-channels option, which also produces a 30.72-MHz wide multi-frequency synthesis (MFS) for each polarisation;
a fourth-order polynomial via the -fit-spectral-pol argument to constrain the spectral behaviour of each clean component;
automatic thresholding down to 3 $\sigma$ , where $\sigma$ is measured as the root mean square (RMS) of the residual XX and YY MFS images at the end of each major cycle;
a major clean cycle gain of 0.85, i.e. removes 85% of the flux density of the clean components at the end of each major cycle;
‘Briggs’ (Briggs Reference Briggs1995) robust parameter of $-1$ ;
10 or fewer major clean cycles, in which the images are inverse Fourier transformed back to visibilities, which are subtracted from the data;
Up to $10^5$ minor cleaning cycles, where the subtraction takes place in the image plane.
During this process the ‘MODEL’ column of the measurement setFootnote e is updated with the source components, and after it has completed, is subtracted from the calibrated visibilities. The observation is then phase-rotated back to its original location. In this way, the chromatic effect of the primary beam sidelobe is taken into account when removing the source, without distorting the overall gains of the observation.Footnote f
3.1.2. Polarisation calibration
We also introduce two extra steps in the calibration stage to make the measurement sets ready for polarisation analysis. One is the parallactic angle correction within the primary beam model (Hales Reference Hales2017), transforming the data from the observed frame (linear feeds on the ground) to an astronomical reference frame according to the IAU standard (polarisation angle measured from North through East). This step is necessary for linear polarisation analysis when observations cover a large range of hour angles. To facilitate later polarisation analysis, we set the cross-terms of the calibration Jones matrices to zero, as well as dividing the Jones matrices for all tiles through by a phasor representing the phases of a reference antenna, which is used for all survey processing. At the same time, we add an X-Y phase determined from observations of a strong polarised source with a known polarisation angle (Lenc et al. Reference Lenc2017). The X-Y phase correction reduces the leakage between linear and circular polarisation, making circularly polarised data available. A detailed description of polarisation calibration, imaging, and a first data release will be given in a separate publication describing the POlarised GLEAM-X survey (POGS-X; Zhang et al., in preparation).
3.2. Imaging
At this stage, the processing diverges depending on whether there is significant Galactic emission. For this paper, we focus on producing catalogues and images which best explore the extragalactic sky (i.e. without attempting to reconstruct such diffuse emission).
While the original GLEAM survey used an image weighting with a ‘Briggs’ robust parameter $-1$ , such a weighting is not suitable for the MWA Phase II extended configuration, as the latter has fewer short baselines, reducing the surface brightness sensitivity. For GLEAM-X, a weighting closer to natural is generally preferred to maximise sensitivity (see Hodgson et al. Reference Hodgson, Johnston-Hollitt, McKinley, Vernstrom and Vacca2020, for a demonstration of the surface brightness sensitivity of MWA Phase II in comparison to other instruments).
To determine an appropriate weighting for extended MWA Phase II imaging, taking into account both angular resolution and surface brightness sensitivity, we trial a range of image weightings, including ‘Briggs’ weighting with robust parameters $0.0$ , $+0.5$ , and $+1.0$ , as well as uniform and natural weightings. We simulate simple 2-dimensional Gaussian sources with varying full-width-at-half-maximum (FWHM) in individual template 154-MHz 2-min snapshots after subtracting astronomical sources and noise. Two runs of normal snapshot imaging are performed for each Gaussian source—one with multi-scale CLEAN enabled and the other without. The flux density of the resultant Gaussian sources was then measured using the source-finding software Aegean to model the Gaussian component. For the purpose of simulating and measuring the model sources at 3, 5, 10, 20, and 1000 $\sigma$ , and an RMS noise level $\sigma$ is estimated from real template images for the given image weightings.
Figure 2 shows the various image weightings for the imaging with/without multi-scale CLEAN with the Aegean flux density measurements of the sources. A significant increase in the recovered flux density during multi-scale CLEAN motivates its use. The ‘best’ case for flux density recovery is a natural weighting with multi-scale CLEAN, however with natural weighting the improvement in angular resolution compared to GLEAM is only a factor of ${\sim} 1.5$ and the point source sensitivity is not maximised. To balance an increase in resolution while retaining overall sensitivity, a ‘Briggs’ robust parameter of $+0.5$ is chosen for the full survey. We note that the fraction of flux density loss decreases with increasing source brightness. For instance, comparing the top and leftmost two panels of Figure 2, 90% of the flux density is recovered for a 10′-FWHM 20- $\sigma$ source, whereas all of the flux density would be recovered for a 1000- $\sigma$ source of the same size.
While these simulations provide an estimate of the flux density recovery for extended Gaussian sources in snapshot observations, the results shown in Figure 2 should not be used to directly correct flux density measurements made in the final mosaics.
WSClean is used to generate images with the following settings:
A SIN projection centred on the minimum-w pointing, i.e. hour angle $=0$ , Dec $-26.7^\circ$
four 7.68-MHz channels jointly cleaned using the -join-channels option, which also produces a 30.72-MHz MFS image;
include and apply the MWA primary beam (Sokolowski et al. Reference Sokolowski2017) during cleaning, to produce a Stokes I image;
automatic thresholding down to $3\sigma$ , where $\sigma$ is the RMS of the residual MFS image at the end of each major cycle;
automatic cleaning down to 1 $\sigma$ within pixels identified as containing flux density in previous cycles (‘masked’ cleaning);
a major cycle gain of 0.85, i.e. 85% of the flux density of the clean components are subtracted in each major cycle;
five or fewer major cycles, in order to prevent the occasional failure to converge during cleaning between 3 and 4 $\sigma$ ;
$10^6$ minor cycles, a limit which is never reached;
multi-scale Clean, with the default deconvolution scale settings, and a multi-scale-gain parameter of 0.15;
$8000\times8000$ pixel images, which encompasses the field-of-view down to 10% of the primary beam;
‘robust’ weighting of 0.5 (see above);
a frequency-dependent pixel scale such that each image always has 3.5–5 pixels per FWHM of the restoring beam;
a restoring beam of a 2-D Gaussian fit to the central part of the dirty beam, which is similar in shape (within 10%) for each frequency band of the entire survey, but varies in size depending on the frequency of the observation.
The extended configuration of the Phase II MWA has low sensitivity to sources with extents ${>}10'$ , and thus is not optimal for recovering the complex emission present in the Galactic Plane. However, the original GLEAM survey was recorded in an identical set of drift scan pointings to GLEAM-X, and at that time the array configuration provided many baselines with sensitivity to these larger angular scales. Thus, for the Galactic plane, we will jointly deconvolve the short baselines of GLEAM with the full GLEAM-X measurement sets, a process enabled by the fast GPU-based image-domain gridding extension to WSClean (van der Tol, Veenboer, & Offringa Reference van der Tol, Veenboer and Offringa2018). This method has been used to great effect to image Fornax A (Line et al. Reference Line2020) and Centaurus A (McKinley et al. Reference McKinley2022), and can also be used for other extended sources such as the Magellanic Clouds. An example of these results is shown in Figure 3 and the full description of the process in the context of the Galactic Plane will be demonstrated in a further paper (Hurley-Walker et al., in preparation).
3.3. Astrometric calibration
The ionosphere introduces a $\lambda^2$ -dependent position shift to the observed radio sources, which varies with position on the sky. Following the method of Hurley-Walker et al. (Reference Hurley-Walker2017) and Hurley-Walker et al. (Reference Hurley-Walker2019b), we use fits_warp (Hurley-Walker & Hancock Reference Hurley-Walker and Hancock2018) to calculate a model of position shifts based on the difference in positions between the sources in the snapshot and those in a reference catalogue, and then use this model to de-distort the images.
For the reference catalogue, we benefit from using catalogues with similar resolution ( ${\sim}$ 1′) covering wide areas. For declinations north of $-30^\circ$ , we use NVSS at 1.4 GHz, and for the southern-sky SUMSS at 843 MHz. From this combined catalogue we select a subset which is sparse (no internal matches within 3′) and unresolved (integrated to peak flux density ratio of ${<}1.2$ ).
For each of the 7.68-MHz sub-bands and the wideband 30.72-MHz images formed from each observation, we estimate the background and RMS noise $\sigma$ using BANE, and perform source-finding using Aegean, with a minimum ‘seed’ threshold of $5\sigma$ . Using the iterative catalogue cross-matching functionality of fits_warp, we cross-match the measured sources to the reference catalogue, typically finding 1000–3000 cross-matches, from which we retain the 750 brightest sources. A greater number of sources does not improve the accuracy of the warping for typical ionospheric conditions, but does add computational load, so this value was chosen as a point of diminishing returns. These sources typically have flux densities ${>}1\,\mathrm{Jy}$ in the NVSS and SUMSS surveys so have adequate astrometry to form the baselines for our corrections.
Snapshot images with fewer than 100 successful cross-matches are discarded (typically ${<}1\%$ of images). The position shifts in the remaining images are typically of order 25′′–5′′ over 72–231 MHz, and are coherent on scales of 1– $20^\circ$ , similar to previous studies with the MWA (e.g. Hurley-Walker & Hancock Reference Hurley-Walker and Hancock2018; Helmboldt & Hurley-Walker Reference Helmboldt and Hurley-Walker2020). fits_warp uses these position shifts to create a warp model, apply it to all pixels, and interpolate the results back on to the original pixel grid. This technique yields residual astrometric offsets (with no obvious preferred direction or structure) of order 6′′ at the lowest frequencies, and 2′′ at the highest frequencies.
3.4. Primary beam correction
While the primary beam model developed by Sokolowski et al. (Reference Sokolowski2017) is significantly more accurate than previous models of the MWA primary beam, there remain some discrepancies between our measured source flux densities and those predicted from existing work. In part, this is due to the flagging of individual dipoles in different tiles across the array, which gives these tiles a different and unmodelled primary beam response. For the observations processed in this work, 72 tiles were fully functional, 39 tiles contained one dead dipole, 14 contained two dead dipoles, and three tiles were flagged for having three or more non-functional dipoles. Including the effect of the flagging by computing and using multiple primary beams at the calibration and imaging stages is computationally expensive, so instead a correction is made after the images are formed.
We cross-match each snapshot with a sparse (no internal matches within 5′), unresolved (major axis $a\times$ minor axis $b<2'\times2'$ ) version of the GLEAM-derived catalogue used for calibration (Section 3.1) and make a global mean flux density scale correction using the flux_warp Footnote g package (Duchesne et al. Reference Duchesne, Johnston-Hollitt, Zhu, Wayth and Line2020), typically of order 5–15%. After this global shift, we accumulate the cross-matched tables. Since the discrepancy is consistent in Hour Angle and Dec between snapshots, we can combine the information in this frame of reference.
For each frequency, as a function of HA and Dec, we compare the $\log_{10}$ of the ratio R of the integrated flux densities of the measured source values and reference catalogue. Similarly to GLEAM ExGal, we find no trends with HA, and up to $\pm10\%$ trends in Dec. Figure 4 shows this effect for a typical frequency band. A fifth-order polynomial model is fitted as a function of Dec using a weighted least-squares fit, where the weights are the signal-to-noise of the sources as measured in each snapshot. The standard deviation of the data from the model ( $\sigma_\mathrm{poly}$ ) is measured, and sources with $|R|>3\times\sigma_\mathrm{poly}$ are removed from the data. A final model of the same form is fitted to the remaining data, forming a correction function which is then applied to every individual snapshot.
After correction, the primary-beam-corrected 30-MHz MFS images have snapshot RMS values of 35– $4\,\mathrm{mJy\,beam}^{-1}$ over 72–231 MHz at their centres, where the primary beam sensitivity is highest.
3.5. Mosaicking
The goal of continuum mosaicking is to combine the astrometrically and primary-beam-corrected snapshot images into deeper images with reduced noise, revealing fainter sources and diffuse structures invisible in the individual snapshots. For optimal signal-to-noise when mosaicking the night-long scans together, we use inverse-variance weighting. The weight maps are derived from the square of primary beam model, scaled by the inverse of the square of the RMS of the centre of the image, as calculated by BANE.
As discussed in Section 2, GLEAM-X was observed at three different hour angles. This gives each drift scan slightly different (u,v)-coverage, which results in a slightly different restoring beam and thus point spread function (PSF). While each individual drift scan would have a unique and very nearly Gaussian PSF, it could be expected that a stacking of different unique Gaussians with different position angles would result in a non-Gaussian shape. Since most source-finders expect sources to be well-approximated by Gaussians, we tested this effect in our mosaicking procedure. We selected the scans with the most dissimilar (u,v)-coverage where there would be significant overlap in sources, those at HAs $-1$ and $+1$ from the Dec $+2$ scans, i.e. where the sky is rotating most quickly and projection effects are most important. We simulated a grid of 1 Jy point sources at common RA and Decs for seven observations from each of these scans, and ran them through our imaging and mosaicking stages, using unity image weighting and neglecting unnecessary astrometric and primary beam corrections. We used Aegean to source-find on the resulting mosaic, making corrections as necessary for the projection (Section 4). We recovered the sources at integrated flux densities of 0.995–0.999 Jy and peak flux densities of 0.96– $0.97\,\mathrm{Jy\,beam}^{-1}$ . Subtracting these Gaussian fits from the image plane data, we found residuals at the ${<}4.5\%$ level, indicating that level of deviation away from Gaussianity. Since the integrated flux densities were recovered well, and the non-Gaussianity is fairly small, even for this worst-case scenario, we adopt this mosaicking method going forward.
For each 7.68-MHz frequency channel, we form a night-long drift scan, and examine it to check for any remaining data quality issues. We also form five 30.72-MHz bandwidth mosaics from the multi-frequency synthesis images generated during cleaning (Section 3.2). After quality checking, for each frequency, data from all four nights that cover the same RA range are combined together to make a single deep mosaic. At this stage, we also form a 60-MHz bandwidth ‘wideband’ image over 170–231 MHz, as this gives a good compromise between sensitivity and resolution, and will be used for source-finding (Section 4).
3.6. Calculation of the PSF
As described in Appendix A of Duchesne et al. (Reference Duchesne, Johnston-Hollitt, Zhu, Wayth and Line2020), imaging away from the phase centre incurs a significant phase rotation during re-gridding as part of the mosaicking process. This re-projection results in a point-spread function that is not defined at the image reference coordinates. This is corrected partially by introduction of a projected regrid factor, f, that is applied to the PSF major axis to form an ‘effective’ PSF major axis. For a resultant ZEA projection this is simply related to the change in solid angle over the original SIN-projected image with (e.g. Thompson, Moran, & Swenson Reference Thompson, Moran and Swenson2001)
where l and m are the direction cosines defined with reference to the original, SIN-projected image direction. The ZEA projection itself reduces additional area-related projection effects due to its equal area nature. This is used in initial source-finding on the mosaics as the integrated flux density is correct and the product of the major and minor PSF axes is also correct for the new projection.
Residual uncorrected ionospheric distortions can cause slight blurring of the final mosaicked PSF. This can be characterised by examining sources which are known to be unresolved, which is best determined by using a higher-resolution catalogue than our calibration sky model; we thus use the NVSS and SUMSS combined catalogue described in Section 3.3. Following Hurley-Walker et al. (Reference Hurley-Walker2017, Reference Hurley-Walker2019b), we cross-match this catalogue with the sources detected in our mosaics at signal-to-noise ${>}10$ , and then measure the size and shape of these sources in the GLEAM-X mosaics. We create a PSF map by averaging and interpolating over these sources, using Healpix (order $=4$ , i.e. pixels ${\sim} 3^\circ$ on each side) as a natural frame in which to accumulate and average source measurements.
After the PSF map has been measured, its antecedent mosaic is multiplied by a (position-dependent) ‘blur’ factor of
where $a_\mathrm{rst}$ and $b_\mathrm{rst}$ are the FWHM of the major and minor axes of the restoring beam, and $a_\mathrm{PSF}$ and $b_\mathrm{PSF}$ are the FWHM of the major and minor axes of the PSF. This has the effect of normalising the flux density scale such that both peak and integrated flux densities agree, as long as the correct, position-dependent PSF is used (Hancock et al. Reference Hancock, Trott and Hurley-Walker2018). Values of B are typically 1.0–1.2.
3.7. Final images
The mosaicking stage of Section 3.5 results in 26 mosaics: one with 60-MHz bandwidth, five with 30-MHz bandwidth and the other 20 covering 72–231MHz in 7.68-MHz narrow bands. In this work, we run the pipeline on four nights of observing indicated in Table A.1, producing a large set of mosaics with decreasing sensitivity toward the edges. Here we downselect to a region which is representative of the survey’s eventual sensitivity, covering $4\,\mathrm{h}\leq \mathrm{RA}\leq 13\,\mathrm{h}$ , $-32.7^\circ \leq \mathrm{Dec} \leq -20.7^\circ$ , for further analysis. Figures 5–7 show this area for four of the deeper mosaics. Postage stamps of these images are available on both SkyView and the GLEAM-X website.Footnote h The header of every postage stamp contains the PSF information calculated in Section 3.6, and the completeness information calculated in Section 4.3.
We use BANE to determine the background and RMS noise of each mosaic. During development of this survey, we noticed that BANE ’s default of three loops of 3-sigma-clipping is insufficient to exclude source-filled pixels to accurately determine the background and RMS noise. The issue may not have been noticed in previous works due to the relatively higher sensitivity and resulting source density of GLEAM-X (although Hurley-Walker et al. (Reference Hurley-Walker2017) noted a similar effect from the high sidelobe confusion levels of GLEAM). We modified BANE to use 10 loops and found that it produced more accurate noise and background estimates (see Section 4.2.2 for further analysis). Figure 8 shows an example of 10 sq. deg of the 170–231 MHz wideband mosaic and associated background and RMS noise, as well as the same region as seen by GLEAM ExGal, in which the resolution is lower, the noise is higher, and the diffuse Galactic synchrotron on scales of ${>}1^\circ$ is visible.
Combining data in the image plane may lead to the recovery of faint sources that were not cleaned during imaging. The RMS noise levels in the wideband (30-MHz) mosaics range from 5– $1.3\,\mathrm{mJy\,beam}^{-1}$ over 72–231 MHz. This compares to typical snapshot RMS values of 35– $4\,\mathrm{mJy\,beam}^{-1}$ over the same frequency range (Section 3.4). Cleaning is performed down to 1- $\sigma$ for components detected at 3- $\sigma$ in a snapshot (Section 3.2). The centres of each image form the greatest contribution to each mosaic due to weighting by the square of the primary beam (Section 3.5). We can therefore estimate at what signal-to-noise threshold uncleaned sources will typically appear: $\frac{3\times35}{5}=21$ – $\frac{3\times4}{1.3}=9$ from 72–231 MHz, and at ${\sim}\frac{3\times4}{1}=12$ in the wideband (60-MHz) source-finding image (Section 4).
Modelling this effect, especially in conjunction with Eddington bias (see e.g. Section 4.2.1), which is also significant at these faint flux densities, lies beyond the scope of this paper. It would involve significant work and is mainly of interest for performing careful measurement of low-frequency source counts (see Franzen et al. Reference Franzen, Vernstrom, Jackson, Hurley-Walker, Ekers, Heald, Seymour and White2019, for an equivalent analysis for GLEAM). At this stage we merely suggest additional caution when using flux densities for sources at low ( ${<}12$ ) signal-to-noise.
The mosaics at this stage are only a subset of the GLEAM-X sky. The RMS increases toward the edges due to the drop in primary beam sensitivity and selected RA range of these observations. Future mosaics comprised of further nights of observing will be combined to produce near-uniform sensitivity across the sky.
4. Compact source catalogue
A source catalogue derived from the images is a useful data product that enables straightforward cross-matching, spectral fitting, and population studies. We aim here to accurately capture components of sizes ${<}10'$ across all frequency bands, fitting elliptical two-dimensional Gaussians with Aegean. We carry out this process on the $1447-\mathrm{deg}^2$ region selected in Section 3.5, and the steps are generally applicable to future mosaics produced from the survey.
4.1. Source detection
We follow the same strategy as Hurley-Walker et al. (Reference Hurley-Walker2017): using the 170–231 MHz image, a deep wideband catalogue centred at 200 MHz is formed. We set the ‘seed’ clip to four, i.e. pixels with flux density ${>}4\sigma$ are used as initial positions on which to fit components, where $\sigma$ denotes the local RMS noise. After the sources are detected, we filter to retain only sources with integrated flux densities ${\geq}5\sigma$ . We then use the ‘priorised’ fitting technique to measure the flux densities of each source in the narrow-band images: the positions are fixed to those of the wideband source-finding image, the shapes are predicted by convolving the shape in the source-finding image with the local PSF, and the flux density is allowed to vary. Where the sources are too faint to be fit, a forced measurement is carried out. We perform several checks on the quality of the catalogue, detailed below.
4.2. Error derivation
In this Section we examine the errors reported in the catalogue. First, we examine the systematic flux density errors; then, we examine the noise properties of the wideband source-finding image, as this must be close to Gaussian in order for sources to be accurately characterised, and for estimates of the reliability to be made, which we do in Section 4.3. Finally, we make an assessment of the catalogue’s astrometric accuracy. These statistics are given in Table 1.
4.2.1. Comparison with GLEAM
GLEAM forms the basis of the flux density calibration in this work, and in this Section we examine any differences between the flux densities measured here compared to those measured by GLEAM ExGal. We select compact sources from both catalogues (integrated/peak flux density ${<}2$ ) that cross-match within a 15′′ radius, and have a good power law spectral index fit (reduced $\chi^2\leq1.93$ ; see Section 4.4). Curved- and peaked-spectrum sources comprise only a small proportion of the catalogue and are more likely to be variable (Ross et al. Reference Ross2021), so are not included in this check. We excluded all sources in GLEAM-X data which have a cross-match within 2′ in order to avoid selecting sources which are unresolved in GLEAM and resolve into multiple components in GLEAM-X.
As surveys approach their detection limit, measured source flux densities are increasingly likely to be biased high due to noise; there are a larger number of faint sources available to be biased brighter by noise than there are bright sources available to be biased dimmer. Eddington (Reference Eddington1913) describes corrections that can be made to an ensemble of measurements to remove this bias. For the purpose of this section, we wish to correct the individual GLEAM flux density measurements in order to check the GLEAM-X flux density scale. We use Equation (4) of Hogg & Turner (Reference Hogg and Turner1998) to predict the maximum likelihood true flux density of each of the GLEAM 200-MHz measurements:
where $\sigma$ is the local RMS noise, and q is the logarithmic source count slope (i.e. the index in $\frac{dN}{dS}\propto S^q$ ); at these flux density levels $q=1.54$ (Franzen et al. Reference Franzen2016).
Figure 9 plots the ratio of the two catalogue integrated flux densities as a function of signal-to-noise in GLEAM-X, with a correction applied to the GLEAM flux densities. The ratio trends toward 1.05 at higher flux densities, although the very brightest sources show only small discrepancies from unity. Since the effect is small, we do not attempt to correct for it here, but may revisit our data processing in future to see if it can be reduced, corrected, or eliminated. Since the flux density scale is tied to GLEAM, which has an 8% error relative to other surveys, this value may be used as an error when combining the data with other work.
No obvious trends are visible in the fitted spectral indices (Figure 10); we note that the error bars on the GLEAM-X measurements are uniformly smaller due to the increased signal-to-noise of the data.
4.2.2. Noise properties
We briefly examine the noise properties of the source-finding 170–231-MHz image. We use a $18\,\mathrm{deg}^2$ region centred on RA $10^\mathrm{h}30^\mathrm{m}$ Dec $-27^\circ30'$ with fairly typical source distribution. Following Hurley-Walker et al. (Reference Hurley-Walker2017), we measure the background of the region using BANE, and subtract it from the image. We then use AeRes (“Aegean REsiduals”) from the Aegean package to mask out all sources which were detected by Aegean, down to $0.2\times$ the local RMS. We also use AeRes to subtract the sources to show the magnitude of the residuals. Histograms of the remaining pixels are shown, for the unmasked and masked images, in Figure 11.
The higher resolution of the GLEAM-X survey compared to GLEAM means that confusion forms a smaller fraction of the noise contribution, and thus the noise distribution is almost completely symmetric. Surveys close to the confusion limit will see a skew toward a more positive distribution, as seen by Hurley-Walker et al. (Reference Hurley-Walker2017). Noise and background maps are made available as part of the survey data release.
4.2.3. Astrometry
Following Hurley-Walker et al. (Reference Hurley-Walker2017), we measure the astrometry using the 200-MHz catalogue, as this provides the locations and morphologies of all sources. To determine the astrometry, high signal-to-noise (integrated flux density ${>}50\sigma$ ) GLEAM-X sources are cross-matched with the isolated sparse NVSS and SUMSS catalogue (Section 3.3); the positions of sources in these catalogues are assumed to be correct and RA and Dec offsets are measured with respect to those positions. The average RA offset is $+14\pm700\,\mathrm{mas}$ , and the average Dec offset is $+21\pm687\,\mathrm{mas}$ (errors are 1 standard deviation).
In 99% of cases, fitting errors on the positions are larger than the measured average astrometric offsets. Given the scatter in the measurements, we do not attempt to make a correction for these offsets. As each snapshot has been corrected, residual errors should not vary on scales smaller than the size of the primary beam. Figure 12 shows the density distribution of the astrometric offsets, and histograms of the RA and Dec offsets, which were used to calculate the values listed in this section.
4.3. Completeness and reliability
4.3.1. Completeness
Following the same procedure as Hurley-Walker et al. (Reference Hurley-Walker2017), simulations are used to quantify the completeness of the source catalogue at 200 MHz, using the wideband mosaics. 26 realisations are used in which 25000 simulated point sources of the same flux density were injected into the 170–231 MHz mosaics (at approximately 20% of the true source density). The flux density of the simulated sources is different for each realisation, spanning the range $10^{-3}$ to $10^{-0.5}$ Jy in increments of 0.1 dex. The positions of the simulated sources are chosen randomly but not altered between realisations; to avoid introducing an artificial factor of confusion in the simulations, simulated sources are not permitted to lie within 5′ of each other. Sources are injected into the mosaics using AeRes. The major and minor axes of the simulated sources are set to $a_\mathrm{psf}$ and $b_\mathrm{psf}$ , respectively.
For each realisation, the source-finding procedures described in Section 4 are applied to the mosaics and the fraction of simulated sources recovered is calculated. In cases where a simulated source is found to lie too close to a real ( ${>}5\sigma$ ) source to be detected separately, the simulated source is considered to be detected if the recovered source position is closer to the simulated rather than the real source position. This type of completeness simulation therefore accounts for sources that are omitted from the source-finding process through being too close to a brighter source.
Figure 13 shows the fraction of simulated sources recovered as a function of $S_{200 \mathrm{MHz}}$ . The completeness is estimated to be 50% at ${\sim} 5.6\,\mathrm{mJy}$ rising to 90% at ${\sim} 10\,\mathrm{mJy}$ ; these flux densities were typically below the RMS noise in GLEAM ExGal. Errors on the completeness estimate are derived assuming Poisson errors on the number of simulated sources detected. Figure 14 shows the spatial distribution of the completeness for the work presented here; the slight dependence on RA is largely due to the presence of bright sources in large mosaics, e.g. Hydra A at ${\sim}$ RA $09^\mathrm{h}20^\mathrm{m}$ Dec $-12^\circ$ . The roll-off in Declination is due to the primary beam sensitivity of the single drift scan used in this work; in the full survey, multiple drift scans will be used to ensure near-uniform sensitivity and completeness across the sky.
The completeness at any pixel position is given by $C = N_{\mathrm{d}}/N_{\mathrm{s}}$ , where $N_{\mathrm{s}}$ is the number of simulated sources in a circle of radius $6^{\circ}$ centred on the pixel and $N_{\mathrm{d}}$ is the number of simulated sources that were detected above $5\sigma$ within this same region of sky. The completeness maps, in fits format, can be obtained from the supplementary material. Postage stamp images also include the estimated completeness at representative flux densities in their headers.
4.3.2. Reliability
To test the reliability of the source finder and check how many of the detected sources might be false detections, we use the same source-finding procedure as described above but search only for negative peaks. Aegean is run with a seedclip of $4\sigma$ (allowing for detections with peaks above this limit) and detections outside of the central region are cut. This initially yields 1144 negative detections. Filtering the results to retain only sources with integrated flux densities $S_{\mathrm{int}}>5\sigma$ leaves 198 detections. Inspection revealed that some of these detections were artefacts around very bright sources, rather than noise peaks (see Figure 15). There were also similar positive detections of artefacts around these bright sources. We filtered out any detections (positive or negative) that were
within 5′ of a positive detection whose peak flux density was ${\geq}2\,\mathrm{Jy}$ and where the absolute value of the ratio of the fainter peak to the bright peak was ${\geq}350$ ; or
within 12′ of a positive detection whose peak flux density was ${\geq}6\,\mathrm{Jy}$ and where the absolute value of the ratio of the fainter peak to the bright peak was ${\geq}650$ .
This accounts for the moderately bright artefacts closer in to the bright sources and fainter artefacts that can exist further out from very bright sources. This filtering cuts 157 positive detections and 149 negative detections.
We also note that there is a tendency for negative sources to appear close to positive sources regardless of their brightness, potentially due to faint uncleaned sidelobes slightly reducing the map brightness very close to sources. These negative sources will not have positive counterparts, so potentially can also be filtered before estimating the reliability. The criterion in this case is that they cross-match with a positive source within 2′. An example is shown in Figure 16. These comprise a further 46 sources which may optionally be removed.
Comparing the filtered samples of negative to positive detections, we can estimate the number of positive detections that are false detections as a function of signal to noise. For a conservative estimate, where we do not apply the second filter, we find that at a signal-to-noise ratio of five, the number of false detections is just under 2%, falling quickly to 1% for $S_{\mathrm{int}}>5.5\sigma$ . If we also filter negative sources that lie close to positive sources, we find that the reliability is much higher, with only 0.75% of sources false at 5- $\sigma$ , and rising to none at 8- $\sigma$ . For each significance bin, we convert these fractions to a reliability estimate and plot them as a function of signal-to-noise in Figure 17. We note that were the noise completely Gaussian, we would expect just one $+5\sigma$ source in this sky area to appear purely by chance, and none with flux density ${>}5.5\sigma$ ; i.e., a reliability of 99.999% in the faintest bin, rising quickly to 100%.
4.4. Spectral fitting
We fit two models to the twenty narrow-band flux density measurements for all detected sources (using $S\propto\nu^\alpha$ ). The first model is a simple power law parameterised as
where $S_{\nu_0}$ is the brightness of the source, in Jy, at the reference frequency $\nu_0$ , and $\alpha$ describes the gradient of the spectral slope in logarithmic space. We also extend this power law model to,
which includes the additional free parameter q to capture any higher order spectral curvature features, where increasing $|q|$ captures stronger deviations from a simple power law; if q is positive, the curve is opening upward (convex) and if q is negative, the curve is opening downward (concave). This model is not physically motivated, and may not appropriately describe sources with different power law slopes in the optically thin and thick regimes, but provides a useful filter to identify interesting sources. For both models we set $\nu_0$ to 200-MHz.
To perform accurate spectral fitting, the errors on the flux density measurements must be known. Following Hurley-Walker et al. (Reference Hurley-Walker2017), spectral fitting allows us to check the flux density consistency of the catalogue. A flux density scaling error of 2% yields a median reduced $\chi^2$ of unity across the catalogue, whereas higher or lower values bias the reduced $\chi^2$ lower or higher as a function of signal-to-noise. We thus adopt 2% as the measure of our internal flux density scale, and set the errors on the flux density to this value added in quadrature with the local fitting error from Aegean. (Note that 8% is more appropriate when comparing with other catalogues as this is the flux density scale accuracy of GLEAM, to which GLEAM-X is tied (see Section 4.2.1).)
We applied the Levenberg-Marquardt non-linear least-squares regression algorithm (as implemented in the scipy python module; Virtanen et al. Reference Virtanen2020) to Equations (4) and (5) for each detected source. We did not include narrow bands with negative integrated flux density measurements. We discarded the fitting results if
there were fewer than 15 integrated flux density measurements for a source;
a $\chi^2$ goodness-of-fit test indicated at a ${>}99\%$ likelihood of an incorrectly fit model; or
$q/\Delta q<3$ , to ensure constrained deviations from a power law are statistically significant.
For this initial data release we included only the model with the lower reduced- $\chi^2$ statistic in our catalogue. Applying these criteria a total of 70432 and 888 source components have fitting results recorded for power law and curved power law models, respectively. Figure 18 shows five example SEDs, four with either power law or curved power law models constrained using exclusively GLEAM-X, and one with GLEAM-X data supplemented with data from SUMSS and NVSS to fit a two-component power law model described as
where $S_p$ is the brightness (Jy) at the peak frequency $\nu_p$ (MHz), and $\alpha_{\mathrm{thin}}$ and $\alpha_{\mathrm{thick}}$ are the spectral slopes in the optically thin and optically thick regimes, respectively (Callingham et al. Reference Callingham2017).
For sources fit well by power law SEDs, the distributions of spectral indices $\alpha$ with respect to flux density are plotted in Figure 19. The median $\alpha$ for the brightest bin is $-0.83$ , in excellent agreement with previous results (e.g. Mauch et al. Reference Mauch, Murphy, Buttery, Curran, Hunstead, Piestrzynski, Robertson and Sadler2003; Lane et al. Reference Lane, Cotton, van Velzen, Clarke, Kassim, Helmboldt, Lazio and Cohen2014; Heald et al. Reference Heald2015).
The priorised fitting routine in Aegean separates the island finding stage from the component characterisation stage, and is analogous to aperture photometry in optical images (Hancock et al. Reference Hancock, Trott and Hurley-Walker2018). We use this in GLEAM-X to ensure that each radio-component identified in our deep 170–231 MHz source-finding image has an equivalent component characterisation in each of the other 25 GLEAM-X images. This process however does not enforce spectral smoothness between images adjacent in frequency. For GLEAM-X, this process becomes less reliable towards lower frequencies, where the PSF becomes large enough that nearby components are blended to the point where their brightness profiles cannot be distinguished. Although model optimisation methods may be able to constrain the total brightness across all components, the brightness between individual components become degenerate. We highlight an example of this behaviour in Figure 20. This problem is most apparent for sources that are slightly resolved and characterised as two separate components within 120′′ from one another. Further development of Aegean to perform component characterisation across all images jointly while including physically motivated parametisation of the spectra is planned to address this issue.
4.5. Final catalogue
The resulting catalogue consists of 78967 radio sources detected over $1447\,\mathrm{deg}^2$ . 71320 sources are fit well by power law or curved-spectrum SEDs. The catalogue has 388 columns (see Appendix B) and is available via Vizier. The catalogue measurements can be used to perform more complex spectral fits, especially in conjunction with other radio measurements. Table 1 shows the properties of the images and catalogue in this data release, as well as some forward predictions for the full survey, in comparison to GLEAM.
5. Extensions to continuum processing
The total data volume of GLEAM-X visibilities is large ( ${\sim} 2\,\mathrm{PB}$ ) and file transfer operations comprise a significant proportion ( ${\sim} 40$ %) of our processing time. When processing the data, each observation takes up ${\sim} 100\,\mathrm{GB}$ of disc space in visibilities, images, and metadata. Given the richness of the GLEAM-X survey, we are strongly motivated to perform additional operations on the data while they reside on disc in order to avoid moving the data more frequently. In this section we discuss the current extensions to the pipeline that we expect will yield a range of science outcomes not possible with mosaicked images.
5.1. Transient imaging
The wide field-of-view of the MWA combined with the repeated drift scanning strategy of GLEAM-X yields a dataset that is interesting to search for transient radio sources. Murphy et al. (Reference Murphy2017) compared the first GLEAM catalogue with TGSS-ADR1 and found a single transient candidate, but understanding its nature was difficult with the (limited) data available. Historically this has been a common occurrence for low-frequency radio transients, with many unusual phenomena detected but never fully understood (e.g. Hyman et al. Reference Hyman, Lazio, Kassim, Ray, Markwardt and Yusef-Zadeh2005; Stewart et al. Reference Stewart2016; Varghese et al. Reference Varghese, Obenberger, Dowell and Taylor2019).
The GLEAM-X drift scans were observed such that the LST was matched for repeated observations at the same pointing and frequency. This enabled a search using ‘visibility differencing’, wherein calibrated measurement sets were differenced, and the resulting nearly empty visibilities were inverted to form a dirty image, which could be used to search for transient sources (Honours thesis: O’Doherty 2021; Hancock et al. in prep.). One high-significance candidate was followed up using the large MWA archive, resulting in the discovery of a new type of highly polarised radio transient, repeating on the unusual timescale of 18.18 min (Hurley-Walker et al. Reference Hurley-Walker2022). The wide bandwidth of GLEAM-X was key to finding the dispersion measure of the source, and therefore estimating its distance.
The visibility differencing approach resulted in a large number of false positives due to the differences in ionospheric conditions between observations. The discovery of a new type of radio transient, and the utility of our polarisation and wideband measurements, motivates the inclusion of a transient imaging step in our routine pipeline processing.
Our approach is to image every 4-s interval of each observation, at the same time subtracting the deep model that was formed during imaging (Section 3.7), the same approach that is currently used for imaging MWA interplanetary scintillation observations (Morgan et al., in preparation). This results in a thermal-noise-dominated Stokes I image cube where only differences between each time step and the continuum average are recorded. This cube is then stored in an HDF5 fileFootnote i as described in Appendix 2 of Morgan et al. (Reference Morgan2018). Briefly, the image cube is reordered so that time is the fastest axis, and the pixel data is demoted to half precision (16-bit) floats. This results in a typical data volume of 600 MB per observation. Once in this format, any number of algorithms can be conveniently applied to detect and measure time-domain signals.
While imaging every 0.5-s sample would be ideal, it would multiply by $8\times$ the storage and processing requirements for all other steps of the pipeline, but if a signal of interest is discovered then it is simple (and indeed necessary) to reprocess the data with higher time (and, if needed, frequency) resolution. Future data releases will provide these data and quantitative analyses thereof.
5.2. Binocular imaging
The source position offsets determined during the de-warping process (Section 3.3) yield information about the slant total electron content (dTEC) averaged over the telescope array projected on to the sky in that field-of-view. If dTEC varies significantly over the array, the wavefronts from different parts of the sky will arrive at different times, and radio sources will appear stretched, duplicated, or will disappear completely. Conversely, if images are created using sub-arrays of the telescope, the apparent difference in source positions can be used to constrain an approximate height of the distorting screen (Loi et al. Reference Loi2015; Helmboldt & Hurley-Walker Reference Helmboldt and Hurley-Walker2020). We thus add a module to the imaging pipeline to routinely produce these binocular images.
In choosing the sub-arrays from the extended Phase II, we face a compromise between sensitivity (higher for large sub-arrays) and parallax lever arm (better for widely separated sub-arrays). Additionally we have no prior knowledge of what ionospheric activity will be observed on the night, nor the resources to adjust the imaging to match at the time of processing. To form a generally useful product, we split the array into two pairs of sub-arrays following the cardinal directions, shown in Figure 21. Each group of 43 or 44 antennas is imaged separately, and source-finding is performed using the default settings of Aegean. These catalogues can form a useful input to future analyses of the ionosphere above the Murchison Radio-astronomy Observatory; the data and analysis will be released in future work.
6. Outlook and conclusions
In this work we described GLEAM-X, a new wideband low-frequency all-southern-sky survey performed using the MWA, as well as the data reduction steps we expect to use to produce a range of continuum data products over 72–231 MHz. Polarisation data will be described in the upcoming paper by Zhang et al. (in preparation). Extensions to our data reduction pipeline to perform transient searches (Section 5.1) and binocular imaging (Section 5.2), as well as joint deconvolution of the Galactic Plane (Figure 3) will further enhance the capabilities of the survey.
To demonstrate the quality and attributes of the images and catalogues that will be produced by GLEAM-X, we release here $1447\,\mathrm{deg}^{2}$ of sky in the form of 26 mosaics across 72–231 MHz of bandwidths 60, 30, and 8 MHz, with RMS noises ranging from 15 to just over $1\,\mathrm{mJy\,beam}^{-1}$ . Additionally, we form a catalogue of 78967 sources, 70432 of which are well-fit across our band with power law spectral energy distributions, and 888 with curved power law spectra. Extrapolating our source density of $55{}\,\mathrm{deg}^{-2}$ to the ${\sim}31000\,\mathrm{deg}^{2}$ that GLEAM-X will eventually cover, we expect to detect of order 1.7 M sources, and produce ${\sim} 1.5\,\mathrm{M}$ radio spectra.
We plan to release the survey in a series of data releases; the next will comprise a large ( ${\sim} 15000\ \mathrm{deg}^{2}$ ) set of images and catalogues covering the southern extragalactic sky centred on the South Galactic Pole (Galvin et al., in preparation); secondly we aim to process and release the complete Galactic Plane (Hurley-Walker et al., in preparation); finally, we will aim to produce contiguous all-sky coverage. Polarisation, transient, and ionospheric data releases and analyses will also proceed over coming years.
These data will enable a range of science outcomes, some of which are outlined by Beardsley et al. (Reference Beardsley2019) in their review of scientific opportunities with Phase II of the MWA. For instance, there is strong potential to detect $10^4$ peaked-spectrum sources in GLEAM-X data, an order of magnitude more than discovered by GLEAM (Callingham et al. Reference Callingham2017), and also probing a population an order of magnitude fainter. Improved signal-to-noise on sources with curved and peaked spectra can provide more efficient selection of high-redshift radio galaxies (Drouart et al. Reference Drouart2020). Many local star-forming galaxies will be resolved, enabling better understanding of the interplay between thermal and non-thermal processes in their energy budgets (Kapińska et al. Reference Kapińska2017; Galvin et al. Reference Galvin2018).
The extended configuration of the Phase II MWA has already been used very capably for targeted investigations of the extragalactic sky, such as determining the remnant radio galaxy fraction in one of the Galaxy and Mass Assembly fields (Quici et al. Reference Quici2021) and detecting diffuse non-thermal emission in galaxy clusters (Duchesne, Johnston-Hollitt, & Bartalucci Reference Duchesne, Johnston-Hollitt and Bartalucci2021). Similar studies over the whole sky, particularly exploiting synergies with other recent wide-area surveys such as RACS, are likely to be highly productive. The higher source density of GLEAM-X will for the first time enable cosmological measurements with the MWA. We can resolve the tension between the angular clustering observed with NVSS and TGSS-ADR1 (Dolfi et al. Reference Dolfi, Branchini, Bilicki, Balaguera-Antolnez, Prandoni and Pandit2019), investigate differential source counts (Chen & Schwarz Reference Chen and Schwarz2015), and by cross-correlating with measurements of the Cosmic Microwave Background, search for the effects of dark energy via the integrated Sachs-Wolfe effect (Sachs & Wolfe Reference Sachs and Wolfe1967). Additionally, GLEAM-X may help to improve sky models for studies of the Epoch of Reionisation, by measuring source brightnesses below 100 MHz, imaging slightly deeper, and separating sources into more components than LoBES (Lynch et al. Reference Lynch2021).
Continuum Galactic science shows promise with MWA Phase II (Tremblay et al. Reference Tremblay, Bourke, Green, Dickey, Wong and Galvin2022), and given the excellent results from our initial exploration of jointly deconvolving GLEAM and GLEAM-X, we expect to make new detections of supernova remnants (SNRs; see e.g. Hurley-Walker et al. Reference Hurley-Walker2019a) and improve measurements of cosmic ray electrons in the Galactic Plane (following Su et al. Reference Su2018). Additionally the improved resolution, sensitivity, and wide bandwidth will make possible the examination of the unshocked ejecta of SNRs (Arias et al. Reference Arias2018) and interactions with their environments (Castelletti et al. Reference Castelletti, Supan, Peters and Kassim2021) via measurements of low-frequency thermal absorption. This creates excellent synergy with TeV observations by the High Energy Stereoscopic System (Hinton & HESS Collaboration Reference Hinton2004; Aharonian et al. Reference Aharonian2006) and the upcoming Cherenkov Telescope Array (Acharya et al. Reference Acharya2013) to search for sites of cosmic ray acceleration in our Galaxy (e.g. Maxted et al. Reference Maxted2019).
The repeated, overlapping epochs of GLEAM-X and its drift scan observing strategy make it possible to explore radio transients and variability on timescales from seconds to years; comparisons to GLEAM enable a seven-year lever arm. Combining these cadences with the wide bandwidth will enable improved investigation of the startling variability of peaked-spectrum sources found by Ross et al. (Reference Ross2021), and enable distance measurements for dispersion-smeared pulsed transients (Hurley-Walker et al. Reference Hurley-Walker2022). As evinced by the latter work, GLEAM-X opens new parameter space in the low-frequency radio sky, and potentially enables further serendipitous discoveries beyond our ability to predict.
Acknowledgements
We thank the anonymous referee for their comments, which improved the quality of this paper. NHW is supported by an Australian Research Council Future Fellowship (project number FT190100231) funded by the Australian Government. KR acknowledges a Doctoral Scholarship and an Australian Government Research Training Programme scholarship administered through Curtin University. DK was supported by NSF grant AST-1816492. CJR acknowledges financial support from the ERC Starting Grant ‘DRANOEL’, number 714245. This scientific work makes use of the Murchison Radio-astronomy Observatory, operated by CSIRO. We acknowledge the Wajarri Yamatji people as the traditional owners of the Observatory site. Support for the operation of the MWA is provided by the Australian Government (NCRIS), under a contract to Curtin University administered by Astronomy Australia Limited. Establishment of the Murchison Radio-astronomy Observatory and the Pawsey Supercomputing Centre are initiatives of the Australian Government, with support from the Government of Western Australia and the Science and Industry Endowment Fund. We acknowledge the Pawsey Supercomputing Centre which is supported by the Western Australian and Australian Governments and the China SKA Regional Center prototype at Shanghai Astronomical Observatory which is funded by the Ministry of Science and Technology of China (under grant number 2018YFA0404603) and Chinese Academy of Sciences (under grant number 114231KYSB20170003). Access to Pawsey Data Storage Services is governed by a Data Storage and Management Policy (DSMP). ASVO has received funding from the Australian Commonwealth Government through the National eResearch Collaboration Tools and Resources (NeCTAR) Project, the Australian National Data Service (ANDS), and the National Collaborative Research Infrastructure Strategy. This paper makes use of services or code that have been provided by AAO Data Central (datacentral.org.au). This research has made use of NASA’s Astrophysics Data System Bibliographic Services. The following software was used in this work: aoflagger and cotter (Offringa, van de Gronde, & Roerdink Reference Offringa, van de Gronde and Roerdink2012); WSClean (Offringa et al. Reference Offringa2014; Offringa & Smirnov Reference Offringa and Smirnov2017); Aegean (Hancock et al. Reference Hancock, Trott and Hurley-Walker2018); miriad (Sault, Teuben, & Wright Reference Sault, Teuben, Wright, Shaw, Payne and Hayes1995); TopCat (Taylor Reference Taylor, Shopbell, Britton and Ebert2005) NumPy (Dubois, Hinsen, & Hugunin Reference Dubois, Hinsen and Hugunin1996; Harris et al. Reference Harris2020); AstroPy (Astropy Collaboration et al. Reference Astropy Collaboration2013); SciPy (Oliphant Reference Oliphant2007), Matplotlib (Hunter Reference Hunter2007). This work was compiled in the very useful online LaTeX editor Overleaf.
A. Observations
B. Catalogue Column Names