Impact Statement
Our method enables imaging fast-repeating processes like the heartbeat and reconstructing multichannel videos with a high temporal resolution without the need for a high-speed camera. It uses a custom hardware controller to modulate and synchronize multiple illumination and imaging devices. We share the code (for simulation and reconstruction) and hardware schematics to implement Paired Alternating AcQuisitions, which allows replication and implementation in a wide array of existing lab microscopes.
1. Introduction
In vivo fluorescence microscopy is an essential tool to study the early stages of organ development in animal embryos thanks to its intrinsically selective contrast.(Reference Lichtman and Conchello1–Reference Vonesch, Aguet, Vonesch and Unser 3 ) When used to image the zebrafish, an excellent animal model for cardiovascular research,(Reference Bakkers 4 , Reference Nguyen, Lu, Wang and Chen 5 ) it allows studying the development stages of the heart(Reference Bussmann, Bakkers and Schulte-Merker 6 –Reference Stainier and Fishman 8 ) and understanding the progression of early cardiac defects and diseases.(Reference Chico, Ingham and Crossman 9 –Reference Scherz, Huisken, Sahai-Hernandez and Stainier 11 ) Correctly imaging the dynamics of the heartbeat requires high-speed acquisition due to the rapid beating of the heart. This becomes a limitation when the low fluorescence intensity emitted by the labeled structures requires a longer exposure time or extremely sensitive cameras that are either too slow or prohibitively expensive. In addition to this, the use of several fluorophores to label multiple tissues usually requires that their emission be recorded sequentially in different channels.
Some acquisition platforms are able to perform simultaneous high-speed acquisition in multiple fluorescent channels. This is usually achieved by splitting the emitted light into multiple beams that are redirected to different cameras, through the use of mirrors(Reference Gregor, Butkevich, Enderlein and Mojiri 12 , Reference Mickoleit, Schmid, Weber, Fahrbach, Hombach, Reischauer and Huisken 13 ) and prisms.(Reference Cai, Wang, Wainner, Iftimia, Gabel and Chung 14 ) However, these come at the cost of an increased complexity in the optics due to the addition of the splitting components. These optics and the use of multiple high-speed cameras make these systems very expensive and could make them difficult to integrate into existing imaging platforms.
As an alternative, when such parallel high-speed imaging is not available, images acquired from different channels at low speed could still be registered to their position in the heartbeat period (phase). Approaches to solve this problem fall into two categories: prospective gating and retrospective gating.(Reference Taylor 15 )
In prospective gating, the acquisition of images is triggered at precise timings that correspond to desired sampling phases in the heartbeat cycle. On big samples, this trigger can be extracted from cardiac probes that precisely measure the start and duration of a heartbeat.(Reference Brau, Wheeler, Hedlund and Johnson 16 –Reference Sablong, Rengle, Ramgolam, Saint-Jalmes and Beuf 18 ) Such signals would be difficult to access in zebrafish embryos due to their small size. Instead, the trigger can be obtained by processing (in real-time) a video signal captured by a dedicated second camera.(Reference Taylor, Girkin and Love 19 , Reference Taylor, Saunter, Love, Girkin, Henderson and Chaudhry 20 ) These prospective gating methods give very good results even with slow acquisition devices, but they also require a dedicated optical setup with real-time processing capabilities for triggering.
Retrospective gating methods acquire images at arbitrary phases and then attempt to estimate these phases via post-acquisition algorithms. Their results are usually not as precise as those of prospective gating, but as they do not require additional triggering hardware they can be easier to implement with existing imaging setups. For example, it is possible to sort images acquired at random times in a period based only on image distance to reconstruct a single-channel video with a virtually increased frame rate.(Reference Mariani, Chan, Ernst, Mercader and Liebling 21 –Reference Zhang and Pless 23 ) In addition to their limited accuracy, these methods can struggle with multichannel data. In some cases, it is possible to perform time registration between multiple channels by using a mutual information criterion(Reference Liebling and Ranganathan 24 , Reference Liebling, Vermot, Forouhar, Gharib, Dickinson and Fraser 25 ) or by registering them with a common reference channel.(Reference Ohn, Yang, Fraser, Lansford and Liebling 26 ) Nevertheless, these channel alignment methods require that high-speed sequences be available in each channel to work.
When imaging with slower cameras, one possibility would be to first generate virtual high frame rate videos of each channel separately using the sorting methods above.(Reference Mariani, Chan, Ernst, Mercader and Liebling 21 , Reference Mariani, Ernst, Mercader and Liebling 22 ) These virtual high-speed sequences could then be synchronized using a mutual information-based algorithm designed for raw high-speed movies(Reference Liebling, Forouhar, Gharib, Fraser and Dickinson 27 ) to obtain the final a multichannel video. We see two main drawbacks to this potential method. First, the virtual high frame rate sequences obtained with sorting are not uniformly sampled, and this sampling is different for each channel. This means that a perfect frame-to-frame pairing from one channel to another does not exist, which will negatively impact the performance of channel alignment. Second, high mutual information between the different channels is not guaranteed, and it can be difficult to obtain correct channel registration depending on the imaged region or the fluorophores used. These two drawbacks limit the expected performance of such a method, and could even make it unusable in some scenarios involving highly uncorrelated signals.
Instead of solving the virtual frame rate increase and channel registration problems sequentially, we propose a method that considers them jointly. In a previous work, we introduced a method to disambiguate images that appeared similar despite being captured in different phases of the heartbeat. We paired sharp images with blurred images that encode the motion at the time of acquisition.(Reference Mariani, Marelli, Jaques, Ernst and Liebling 28 ) In this article, we extend this technique into a general approach centered around Paired Alternating AcQuisitions (PAAQ), which result in image sequences whose frames alternate between a common reference modality and other channels (e.g., fluorescence channels). We then use this common reference to sort images from all the channels at once, achieving simultaneously a virtual high frame rate and multichannel registration. In order to address the lack of precision of naive frame sorting approaches, we also propose a phase estimation algorithm. In addition to improving phase accuracy, it also overcomes two central limitations of plain image sorting: the resulting movies can be inverted in time and their phases cannot be assigned quantitative time units to measure durations.
2. Problem definition and metrics
We define $ {f}_c $ as the intensity of the signal obtained by imaging a periodic phenomenon (the embryo heartbeat) using fluorescence microscopy, with each channel number $ c=0,\dots, C-1 $ referring to a distinct combination of illumination and emission wavelengths. The measured signal varies according to its spatial location $ \left(x,y\right) $ and its phase $ \theta $ . The intensity $ {f}_c $ has a period of $ 2\pi $ , which is expressed as:
The phase itself is a function of time expressed as $ \theta (t) $ . In ideal time-periodic systems, it increases linearly over time $ t $ following $ \theta =\omega t $ , where $ \omega $ is the angular frequency. In practice, such a relation is too simplistic for the cardiac cycle. Indeed, some variability stems from both the biological nature of the phenomenon and from environmental changes (e.g., temperature rising due to prolonged exposure to light leading to an increase in heart rate.(Reference Ohn and Liebling 29 ) Although the exact relationship between time and phase $ \theta (t) $ is unknown, it still follows a trend that is roughly linear $ \theta (t)\simeq \overline{\omega}t $ , where $ \overline{\omega} $ is the average angular frequency calculated over multiple periods.
We consider $ C $ sequences of $ N $ images $ {\mathbf{f}}_c\left[:,:,n\right] $ (where $ :,: $ is a shorthand notation to describe all rows and columns of the image,(Reference Golub and Van Loan 30 ) and $ n=0,\dots, N-1 $ ) obtained by sampling the signal $ {f}_c $ at unknown phases $ {\theta}_{c,n} $ with a frame rate $ {F}_{\mathrm{acq}} $ , using a procedure that we detail below. Our objective is to retrieve the phases $ {\theta}_{c,n} $ , which contain all the information required to reconstruct synchronized image sequences across all channels with an increased temporal resolution, as the phases give the position of each frame in the cycle.
More specifically, we want to compute estimates $ {\tilde{\theta}}_{c,n} $ as close as possible to the ground-truth phases $ {\theta}_{c,n}, $ which are available only during simulation-based evaluation. We define our evaluation criterion as the following error function:
where $ \Theta $ is a phase shift common to all estimations, and $ \hskip0.35em \ominus \hskip0.35em $ is a phase difference operator:
The global phase shift $ \Theta $ in the error signifies that we do not care about the absolute phases of our measurements, but only about their position relative to each other in the period. Indeed, the phase of the first frame in our signal is unknown, but this information is not necessary to derive meaningful information such as the time elapsed between events in the period, or to synchronize the channels.
3. Methods: Acquisition and reconstruction
In order to solve the previously defined problem, we introduce a two-stage method that combines PAAQ imaging and post-acquisition processing to virtually increase the frame rate. The purpose of PAAQ is to create a reference signal common to all fluorescence channels that allows synchronizing the sequences with high precision. We then use the image-processing step to refine the estimate of the phase of each frame, as well as to compute the main properties of the imaged signal. We illustrate the whole pipeline in Figure 1. Sections 3.1–3.4 detail the different stages of our method.
3.1. Implementing PAAQ with active illumination
In our previous work,(Reference Mariani, Marelli, Jaques, Ernst and Liebling 28 ) we have introduced a technique that uses active illumination to acquire pairs of images in quick succession by alternating between temporal illumination patterns. The first image in a pair encodes movement information for unequivocally sorting the sequence in order of increasing phase, while the second uses a short light flash that produces a sharp image used for analysis and display of the reconstructed sequence. We extend this method to achieve synchronization of multiple fluorescent channels by switching between imaging modalities at the same time as illumination patterns.
We start by defining a reference signal $ g\left(x,y,\theta \right) $ that is obtained by imaging the beating heart using brightfield microscopy. Any modality is suitable to acquire $ g $ , and even one of the $ {f}_c $ could be the reference. However, since it is a sacrificial signal that will be discarded after imaging (as the time-encoding illumination results in blur), we choose to use brightfield as it causes little to no photobleaching. The intensity of this reference varies over the same space and phase as $ {f}_c $ :
For each channel, we use PAAQ to capture a sequence of $ N $ image pairs $ \left({\mathbf{g}}_c\left[:,:,n\right],{\mathbf{f}}_c\Big[:,:,n\Big]\right) $ , consisting of a reference image and a signal image captured in quick succession by switching from brightfield to fluorescence illumination every other frame:
where $ {\Delta}_x $ and $ {\Delta}_y $ are the pixel width and height, $ l=0,\dots, L-1 $ and $ m=0,\dots, M-1 $ the row and column index pairs, and $ n=0,\dots, N-1 $ the time frame index. $ {\Delta}_T $ is the time interval between two consecutive frames, and $ {t}_c $ the arbitrary time at which the first reference image associated to channel $ c $ is acquired. Since we capture the images without interruption, $ {\Delta}_T $ directly relates to the acquisition frame rate of the camera $ {F}_{\mathrm{acq}}=1/{\Delta}_T $ . The phase corresponding to each image $ {\mathbf{f}}_c\left[:,:,n\right] $ is $ {\theta}_{c,n}=\theta \left({t}_c+\left(2n+1\right){\Delta}_T\right) $ and $ r(t) $ is the temporal illumination pattern used to capture images of the reference signal. It is defined over the interval $ \left[0,{\Delta}_E\right) $ where $ {\Delta}_E $ is the exposure time of the camera. When acquiring images of the fluorescent signals, a short light pulse of duration $ {\Delta}_P $ illuminates the sample, with $ {\Delta}_P\le {\Delta}_E $ . By using a short pulse, we can ensure that the fluorescent images are sharp even if the frame rate of the camera is low. Figure 2 illustrates PAAQ imaging, corresponding to Step (b) in Figure 1.
The PAAQ method is based on the assumption that the delay between a reference image and its associated signal image is small. By acquiring the reference image at the very end of the exposure time of the camera, we make this delay entirely independent of the frame rate of the device. Indeed, this delay becomes $ {\Delta}_C={\Delta}_T-{\Delta}_E $ the transfer time of the camera, which is the time between two exposure periods during which the device reads the pixel information and gets ready to capture a new frame. Even on devices with low frame rate, this transfer time is typically very low (on the order of a few ms). Therefore, we shape $ r(t) $ to capture most information at the end of the exposure period, minimizing the delay between images in a pair. Instead of the continuous ramp used in the original method(Reference Mariani, Marelli, Jaques, Ernst and Liebling 28 ) we use a series of pulses of increasing amplitude for $ r(t) $ , as this pattern is easier to generate while retaining the same motion-encoding properties as the ramp.
Given that the delay between a signal frame and its associated reference is very small, the phase difference between the two will also be small due to the linear trend between time and phase. Over such a low duration, the variability in the heartbeat cycle is negligible, and we can consider the phase delay between each reference and its paired signal frame to be a constant. Therefore, since a constant phase shift does not impact the global error defined in (2), finding the phase of each signal frame is equivalent to finding the phase of its associated reference. This is a much easier problem to solve, as all references are of the same modality, and the results obtained on reference sequences will be directly applicable to their associated signals from different channels. We are thus able to extend the PAAQ technique to generate synchronized virtual high frame rate sequences from multiple fluorescent channels at no additional computational cost.
3.2. Sorting and orienting the references
We combine all the reference images into a single sequence $ \mathbf{g}\left[:,:,n\right] $ by concatenating the acquired images $ {\mathbf{g}}_c $ for each channel, containing a total of $ {N}_g= NC $ reference images. Similarly, we concatenate signal frames from all channels into a single sequence $ \mathbf{f}\left[:,:,n\right] $ . We can then apply a phase-sorting algorithm to the reference sequence $ \mathbf{g} $ to obtain a permutation $ \sigma :\left\{0,\dots, {N}_g-1\right\}\to \left\{0,\dots, {N}_g-1\right\}:n\to \sigma (n) $ that sorts the references by increasing phase. For this, we use a traveling salesman method introduced in previous works(Reference Mariani, Chan, Ernst, Mercader and Liebling 21 ) that finds the permutation $ \sigma $ by minimizing the total absolute image difference between consecutive images. The obtained sorted sequence $ \mathbf{g}\left[:,:,\sigma (n)\right] $ contains all the reference images reordered to reproduce a single period of the heartbeat, with a virtually increased frame rate that depends only on the amount of images $ {N}_g $ . This corresponds to Step (c) in Figure 1.
This sorting-based method assumes that the images in the reordered sequence correspond to a uniform sampling of the signal. It assigns a linear phase estimate based on the position of each frame in the sorted sequence:
The traveling salesman solution does not contain directional information. Indeed, the permutation $ {\sigma}^{\prime }(n)={N}_g-\sigma (n)+1 $ corresponds to a tour with the exact same cost. However, direction is important for the reconstruction of a faithful video that follows the chronological order of events. If we assume that the sampling frequency of the reference frames (i.e., half the frame rate of the camera) is at least twice as big as the frequency of the imaged phenomenon (similarly to the Nyquist criterion), we can retrieve the direction of the sequence by computing the average phase distance between consecutively acquired frames:
When the sampling frequency assumption is met, the phase distance between consecutive frames must be smaller than $ \pi $ (half a period), and $ {\overline{\Delta}}_{\hat{\theta}} $ must be positive. If $ {\overline{\Delta}}_{\hat{\theta}} $ is negative, the permutation obtained using sorting is in the wrong direction, and we must use $ {\sigma}^{\prime } $ instead to sort the sequence and compute the uniform phase estimate $ \hat{\theta} $ . This is shown as Step (d) in Figure 1.
3.3. Nonuniform phase estimation using image distance
The assumption that the sorted sequence corresponds to a uniform sampling of a single period of the signal is imprecise and leads to artifacts in the reconstructed sequence. Due in part to the ratio between the acquisition frequency and the frequency of the imaged signal, and in part to the variations in the periodicity of the phenomenon, different regions of the period will contain more or less sampling points, as shown in Figure 3b. When reconstructing a video with the assumption that the sampling is uniform will generate distortions: stretching and shrinking will appear in low and high sampling rate areas, respectively, as illustrated in Figure 3c. It is therefore important to refine the phase estimate in order to reconstruct a faithful signal that correctly represents the dynamics of the imaged system.
In order to correct distortions in volumetric electron microscopy imaging caused by inhomogeneous sampling in the axial direction, Hanslovsky et al. have developed an image-based method that uses image-to-image similarity to estimate the position of slices in the volume.(Reference Hanslovsky, Bogovic and Saalfeld 31 ) Although their application is concerned with resampling the depth dimension in static volumes rather than the temporal dimension in videos of the beating heart, the underlying ideas are highly relevant to our cardiac application problem and we have implemented an algorithm that is inspired by theirs. For the sake of completeness, we provide the derivation in full and point to similarities and differences.
The gist of the method is the following. If one were to fix an origin at the arbitrary time point with index $ n $ in the image sequence and plotted an image-distance between the image at the origin and images located at continuous-phase positions before or after the origin, one would obtain a function $ {\mathcal{D}}_n(u) $ , with $ u $ the phase relative to the origin. If that function was known, one could deduce the phase of any image relative to the image at the origin simply by computing their image-distance and inverting $ {\mathcal{D}}_n(u) $ . Since this function is neither known in practice nor unique – it depends on the image contrast, the field of view, and the cardiac phase of the image taken as the origin, etc. – the key idea is to approximate it by a model. Two assumptions allow building the model and estimating its parameters.
The first assumption is that the pixel-based distance between images in the signal increases with their phase difference. Indeed, images that are temporally close to one another in the signal should be similar, while images further away are more likely to look different. This echoes the hypothesis on which the sorting-based virtual high frame rate technique is built.(Reference Mariani, Chan, Ernst, Mercader and Liebling 21 ) We use the Minkowski distance of order 1 as the distance metric $ d $ between two images $ \mathbf{a} $ and $ \mathbf{b} $ of size $ L\times M $ :
The above assumption means that the distance curve at any $ n $ , $ {\mathcal{D}}_n(u) $ , monotonically increases from 0 at $ u=0 $ to a maximum. Yet since the distance curves also inherit the periodicity of the imaged signal:
the $ {\mathcal{D}}_n(u) $ must first increase, then monotonically decrease from their maximum back down to 0 at $ u=2\pi $ .
The second assumption of our method is that two distance curves corresponding to two nearby origin points (whose phase is close) should be near identical. This is referred to as local constancy of shape by the authors of the volumetric imaging method.(Reference Hanslovsky, Bogovic and Saalfeld 31 ) This assumption seems reasonable for our problem as the motion of the heart is continuous, and over any arbitrarily short timespan its speed is near constant.
In comparison to the method introduced for volumetric imaging(Reference Hanslovsky, Bogovic and Saalfeld 31 ) our assumptions are similar, except that we do not assume that the distance curves are monotonic over the whole domain as it could not account for the periodicity of the cardiac cycle. Instead, we impose that it can be split in two monotonic parts, one increasing and one decreasing, which repeat periodically. It is noteworthy that our derivation uses a distance metric (which decreases with the phase difference) rather than a similarity measure (which increases with the phase difference). Beyond the definition of $ {\mathcal{D}}_n $ , we will note further differences in Sections 3.3.1 and 3.3.2.
Based on our assumptions above, we can now derive a two-step algorithm to estimate the nonuniform positions of the samples: first, we estimate the $ {\mathcal{D}}_n(u) $ corresponding to each sample in our sequence, and then we use these to compute new phase estimates, as illustrated in Figure 4. We iterate over these two steps until the phase estimation converges, which corresponds to Step (e) in Figure 1. The following subsections describe each step of the algorithm in more detail.
3.3.1. Estimating the distance curves locally
We model the distance curves $ {\mathcal{D}}_n $ using piecewise-linear functions $ {\tilde{\mathcal{D}}}_n $ characterized by values $ {\tilde{\mathbf{d}}}_n\left[l\right] $ , $ l=0,\dots, L-2 $ which we will compute below, defining $ L $ intervals over $ u\in \left[0,2\pi \right) $ :
We impose constraints on the values $ {\tilde{\mathbf{d}}}_n\left[:\right] $ to fulfill the monotonicity assumption on $ {\mathcal{D}}_n $ (stemming from the hypothesis that images farther away in phase look less similar):
where $ {L}_n^{\mathrm{max}} $ is the position of the maximum of the function $ {\tilde{\mathcal{D}}}_n $ , which is unknown a priori.
The image distance between any arbitrary origin and any other image at indexes $ n $ and $ i $ in our sequence $ d\left(\mathbf{g}\left[:,:,n\right],\mathbf{g}\left[:,:,i\right]\right) $ should match the value of the associated unknown distance curve evaluated at the corresponding phase distance: $ {\mathcal{D}}_n\left({\tilde{\theta}}_i-{\tilde{\theta}}_n\right) $ . Taking advantage of the local constancy of shape of $ {\mathcal{D}}_n $ , we can consider all image distances in a local neighborhood around any origin sample $ n $ as measurements of the same distance curve. This allows us to find the values $ {\tilde{\mathbf{d}}}_n\left[:\right] $ for each $ n $ by solving a weighted linear least squares fitting problem that minimizes:
where $ \mathbf{D}\left[i,j\right]=d\left(\mathbf{g}\left[:,:,i\right],\mathbf{g}\left[:,:,j\right]\right) $ is the matrix containing all pairwise image distances between reference frames, and $ {w}_{\sigma }(u) $ is a Gaussian windowing function of given standard deviation $ \sigma $ that gives more importance to measurements in a close neighborhood. This step is illustrated in Figure 4a.
If the phase estimates $ {\tilde{\theta}}_n $ are close enough to the real phases, then the modeled $ {\tilde{\mathcal{D}}}_n $ will be good approximations of the distance curves $ {\mathcal{D}}_n $ . When solving this step for the first iteration, we initialize $ {\tilde{\theta}}_n $ with the uniform sampling approximation $ {\hat{\theta}}_n $ defined in (7) which gives a mean error of less than $ 10\% $ of the period if the sorted sequence contains at least $ 10 $ images.(Reference Mariani, Ernst, Mercader and Liebling 22 )
Taking into account the monotonicity constraints in (13) and (16), we use an efficient quadratic cone programming algorithm(Reference O’Donoghue 32 ) to minimize (17). However, writing the constraints requires knowing $ {L}_n^{\mathrm{max}} $ , which we find by solving the problem without monotonicity constraints beforehand. We detail the expression of this quadratic programming problem in Appendix A.
We note here a second set of differences between our method and the volumetric imaging one,(Reference Hanslovsky, Bogovic and Saalfeld 31 ) which lie in how we write and solve the curve fitting problem. Indeed, we do not resample the distance matrix $ \mathbf{D}\left[:,:\right] $ to compute the loss in (17). This gives a solution that corresponds more closely to the measurements but makes the least squares problem more complex to write in matrix form. Moreover, using a constrained solver allows us to guarantee the monotonicity of the estimate, thus removing the need of an additional truncating step to ensure the validity of the solution. We also choose to use all frames of the sequence to compute each curve estimate instead of using an additional windowing that would exclude distant points from the computations. This allows us to obtain distance curves defined over the whole period, which we will use in the following step to update the phase estimates.
3.3.2. Fitting the phases to the curves
Using the results of the previous step, we can now update the phases $ {\tilde{\theta}}_n $ in order to make the measurements $ \mathbf{D}\left[:,:\right] $ match the estimated distance curves $ {\tilde{\mathcal{D}}}_n $ as closely as possible. We write this as a weighted least squares cost function:
We use the same Gaussian windowing function $ {w}_{\sigma } $ as in (17) to give less importance to samples far from the origin point of the distance curves. This addresses the fact that these samples are more noisy and less reliable.
We introduce a regularization term derived from the fact that we acquire images consecutively for each channel, at a constant sampling rate. We consider the neighborhood containing any three reference frames acquired successively. Over this region, we approximate the evolution of the phase over time using a partial Taylor sum of degree 1. As the time delay between the acquisition of consecutive reference frames is constant, according to this local model the phase step between these frames should be nearly constant. We express this as the following regularization cost:
We combine the two cost functions using a regularization strength $ \lambda $ :
We minimize this objective function using a gradient descent approach. More specifically, we use the Adam adaptive learning rate optimizer(Reference Kingma and Ba 33 ) to find an optimal estimate for $ {\tilde{\theta}}_n $ .
Since we update the phase estimates in this step, the modeled distance curves $ {\tilde{\mathcal{D}}}_n $ are no longer valid, and we must recompute them with the new phases. We iteratively minimize (17) and (20) until convergence of the phase estimates. We describe a reliable way to choose the values of the different hyperparameters for minimization in Section 3.6.
We note here the third main set of differences between the volumetric imaging method(Reference Hanslovsky, Bogovic and Saalfeld 31 ) and ours, which concerns the loss function $ {\mathcal{L}}_{\mathrm{phase}} $ used for phase estimation. First, we use the distance metrics rather than the phases to compute the cost in (18). This circumvents the need to use the inverse of $ {\tilde{\mathcal{D}}}_n $ , which is not clearly defined as the distance curves are not bijective functions. Then, we do not use the scaling factor introduced in the volumetric imaging method, as its purpose is to correct for varying degrees of noise between images, and localized artifacts on some frames. Given the imaging method we use, the noise level is comparable in all images, and using this scaling factor would only add more complexity to the algorithm with no expected performance improvement. Finally, we introduce a new regularization term in (19), tailored to our imaging procedure to ensure that the estimated phases are consistent with the acquisition sampling rate.
3.4. Reconstructing signal sequences
Our assumption that the phase delay between reference and signal images is constant allows us to use the reference phase estimates $ {\tilde{\theta}}_n $ to reconstruct synchronized virtual high frame rate sequences from the signal frames, as shown in Figure 1f. To generate videos, we resample the sorted sequences at an arbitrary fixed rate. Each video frame contains the signal frame of the corresponding channel with the closest possible phase, as shown in Step (g) of Figure 1.
By combining the phase estimates with the known acquisition times, we can compute the average frequency of the imaged signal over any region of the sequence. Given that the phase estimates are wrapped over a single period, we first apply a simple temporal phase unwrapping algorithm(Reference Huntley and Saldner 34 ):
where $ {\tilde{\theta}}_{c,n}^{\prime } $ are the unfolded phases for each channel, and $ {\phi}_{c,n} $ represents how many periods have elapsed before its associated frame. It increases by a full period when the difference between two consecutive frames is lower than a threshold of $ -\frac{\pi }{2} $ , which prevents counting too many periods because of small errors in the phase estimates. We can then compute the average angular frequency over a chosen region of the signal by fitting a linear model on the points $ \left({t}_{c,n},{\tilde{\theta}}_{c,n}^{\prime}\right) $ within that interval using linear least squares. The slope of the model directly gives $ \overline{\omega} $ over the analyzed region.
The frequency information, which was missing in previous sorting-based virtual high frame rate methods,(Reference Mariani, Chan, Ernst, Mercader and Liebling 21 , Reference Zhang and Pless 23 , Reference Mariani, Marelli, Jaques, Ernst and Liebling 28 ) allows to study the heart rate of the sample. It also allows converting phases into time units, which is better suited for analyzing the dynamics of the heartbeat.
3.5. Theoretical analysis of the error caused by heart rate variability
In practice, as the heart rate varies, so does the phase delay between paired frames. This introduces errors in the reconstruction step described in the previous section, as it is based on the assumption that these phase variations are negligible. In order to find in which range this approximation is valid, we want to quantify the impact of the cardiac frequency variability on the performance of our method.
Using (2), we derived (see Appendix B) an upper bound for the error $ {\mathcal{E}}_{\unicode{x025B}} $ that results from the variability in the heart rate:
where $ {\Delta}_C $ is the time delay between paired images, and $ {\unicode{x025B}}_{c,n} $ is the deviation between the instantaneous frequency and its average: $ {\omega}_{c,n}=\overline{\omega}+{\unicode{x025B}}_{c,n} $ .
Thus, we can compute the upper bound of the error introduced by heart rate variability if we know the statistical distribution of the cardiac frequency during the acquisition. For practical applications, it is easier to describe it by using $ {\omega}_{\mathrm{min}} $ and $ {\omega}_{\mathrm{max}} $ , the minimal and maximal values (respectively) that this frequency can take over the imaging process.
In the worst possible case, the heart rate is maximal during half of the acquisition and jumps without transition to its minimal value for the other half of the acquisition. The bound in (24) then becomes:
If the cardiac frequency is uniformly distributed over its range during the acquisition (e.g., if the heart linearly accelerates at a constant rate), the bound gets smaller:
And if its distribution is a normal (e.g., if the heart rate oscillates around its average value), using the three-sigma rule to set the interval, the bound tends toward an even lower value:
In the case of a healthy heart that smoothly accelerates or decelerates, and may settle around a given cardiac frequency, the distribution lies between the uniform and the normal.
For example, we compute the upper bound in a realistic scenario with a delay of $ {\Delta}_C=10\;\mathrm{ms} $ between reference and signal images, and an average heart rate of 2.5 beats per second, considering the cardiac frequency distribution to be at worst uniform. If the heart rate stays between 0.5 and 4.5 beats per second, the error due to cardiac frequency variability will be lower than 1% of the period: $ {\mathcal{E}}_{\unicode{x025B}}\le {10}^{-2}\frac{4.5-0.5}{4}2\pi $ .
We consider a different scenario with all parameters equal except the interval $ {\Delta}_C=0.033\;\mathrm{s} $ . This corresponds, for example, to using a camera with a frame rate of 30 fps in a setup where triggering the reference acquisition at the end of the exposure time is not possible. To guarantee the same 1% upper bound as above in this scenario, the heart would have to stay between 1.9 and 3.1 beats per second: $ {\mathcal{E}}_{\unicode{x025B}}\le 0.033\frac{3.1-1.9}{4}2\pi $ .
We can use these bounds to predict what performance to expect, based on rough estimates of the range of heart rates that are expected or measured. This error induced by frequency variability will then be combined with the precision of the phase estimation algorithm when measuring the overall error of our method.
3.6. Reliable computation of the hyperparameter values
The performance of our phase estimation algorithm depends on the values of multiple hyperparameters. In order to be able to apply our method in practice, we propose a reliable way to compute optimal values for these parameters based on the acquired images. This guarantees the reproducibility of our results, and allows to apply our method in new scenarios without requiring manual hyperparameter tuning.
The regularization strength $ \lambda $ is the parameter with the most impact on performance. We find its optimal value using an L-curve.(Reference Hansen 35 ) This consists in plotting the values of the residual cost from (18). The final regularization cost from (19) for multiple values of $ \lambda $ . The curve resembles the shape of an L, as seen in Figure 6c. To compute the optimal $ \lambda $ , we first apply min–max normalization to both costs then multiply the regularization cost by a factor of 10. This is because in our case, the regularization corresponds to a strong prior about the system, and must play a bigger role in selecting $ \lambda $ . We then choose the point in the rescaled L-curve that is the closest to the origin as the optimal $ \lambda $ .
The width of the Gaussian window $ \sigma $ used in (17) and (18), as well as the number of points $ L $ used to characterize the distance curves also impact the performance. To guarantee good results, $ \sigma $ must be small enough to capture local details, but large enough to avoid amplifying noise. The precision overall improves with larger $ L $ , but at the cost of a slower convergence. In Appendix C, we detail our method to find appropriate values for these parameters and for the gradient descent update step.
To enhance the performance of phase estimation, we can restart our iterative algorithm after convergence, using the obtained phase estimates as the initial guess for the new run. We call this additional step a precision pass. When restarting the minimization, we halve the width of the Gaussian window $ \sigma $ and double $ L $ to get finer phase estimates. Even with the higher $ L $ and smaller $ \sigma $ , the minimization converges as it starts from a better initial guess. This allows getting better precision using bigger values of $ L $ that would have caused convergence issues if used directly on the initial uniform phase estimates.
4. Methods: Fluorescence cardiac imaging simulation
Previous works used simplified simulation models of the heart for performance characterization, consisting of a circle (or cylinder) textured with a sinusoidal pattern, with a radius varying over time to represent beating.(Reference Mariani, Chan, Ernst, Mercader and Liebling 21 , Reference Mariani, Ernst, Mercader and Liebling 22 , Reference Liebling, Forouhar, Gharib, Fraser and Dickinson 27 , Reference Mariani, Marelli, Jaques, Ernst and Liebling 28 ) These models lack properties such as irregular texture, asymmetric contraction, or small variations in the periodicity that we find in experimental cardiac images. In order to test the different methods and characterize their performance as reliably as possible, we designed a simulation tool that reproduces the relevant properties of fluorescent beating heart images. We aim to synthesize a smooth organic-like textured object that deforms following a nonlinear contraction wave that propagates periodically through time and space asymmetrically. We also want to generate multiple channels that contain variable amounts of mutual information, and to emulate the small variations in the periodicity of the heartbeat.
4.1. Simulating a beating heart section
We define a function $ h\left(x,y\right) $ over a two-dimensional space to simulate a heart section:
where $ e\left(x,y\right) $ represents an ellipse with a thick outline and $ \sigma \left(x,y\right) $ is a two-dimensional simplex noise(Reference Perlin 36 , Reference Perlin 37 ) used for texturing. Simplex noise is a gradient noise function that generates visually isotropic and continuous textures, and is commonly used in computer-generated graphics to procedurally create natural-looking images. Its properties make it suitable for creating synthetic textures with realistic features in biomedical imaging,(Reference Abdolhoseini, Kluge, Walker and Johnson 38 –Reference Dustler, Förnvik and Lång 41 ) and we use it here to generate an organic-like texture for the heart walls.
For later use in our simulations, we introduce a periodic continuously smooth pulse function obtained by composing a sigmoid and a sine function:
where $ b $ is the sigsin bias and $ s $ its slope. $ b $ controls the width of the pulse, with higher bias generating shorter pulses, while $ s $ affects the overall slope of the rise and fall of the function, with higher slope creating a fast-rising pulse.
In order to simulate the propagation of heart contraction through the section, we create a continuous space transformation that preserves topology. This transformation varies on $ x $ and $ \theta $ in order to model the propagation of the contraction along one spatial dimension in time. It consists locally of a small rotation followed by a scaling, representing the twisting and shrinking of the heart walls that happen during contraction. This transformation can be expressed in matrix form:
where the rotation and scaling matrices $ \mathbf{R}\left(x,\theta \right) $ and $ \mathbf{S}\left(x,\theta \right) $ are defined as follows:
The rotation angle $ \phi \left(x,\theta \right) $ and shrinking factors $ {S}_x\left(x,\theta \right) $ and $ {S}_y\left(x,\theta \right) $ are governed by a sigsin pulse:
where $ A $ is the amplitude of the pulse, $ {\theta}_0 $ is its initial phase shift, and $ \lambda $ is the spatial wavelength of the contraction pulse. The parameters $ A $ , $ {\theta}_0 $ , $ b $ , and $ s $ are set independently for the functions $ \phi $ , $ {S}_x $ , and $ {S}_y $ , allowing to adjust the model to obtain a more realistic contraction.
Using the defined transform, we then compute the simulated reference signal as:
In order to generate multiple fluorescent channels, we multiply this reference signal with two-dimensional masking functions $ {m}_c\left(x,y\right) $ :
This method to simulate multiple fluorescent channels is not biologically plausible, but serves only as a means to easily control the amount of overlap between channels through the shape of the masks. This is sufficient to characterize the performance of our algorithms, and a biologically more accurate model is unnecessary in this scope. Simulated heart sections in different positions are visible in Figure 2.
4.2. Modeling the heart rate variability
We model the variability of the heartbeat as the combination of two contributions: a random jitter that affects the phase locally, and a random acceleration that affects the phenomenon over a longer time span. The first is motivated by the imperfect nature of a biological process that creates local variations within a cycle, while the latter represents a slow variation of the average angular frequency over time. Using this model, we express the simulated phase at time $ t $ as:
with $ {\theta}_0^{\mathrm{sim}} $ the starting phase and $ {\omega}_0^{\mathrm{sim}} $ the initial angular frequency. We use Wiener stochastic processes to model the variability in phase $ {W}_{\theta }(t) $ and in frequency $ {W}_{\omega }(t) $ of the system, defined by:
with $ \mathcal{N}\left(0,u\right) $ is a normal distribution of mean $ 0 $ and variance $ u $ . Their amplitudes $ {\sigma}_{\theta } $ and $ {\sigma}_{\omega } $ allow to control the overall amount of randomness in the simulation, as well as the strength ratio between the two sources of stochasticity. We chose Wiener stochastic processes because of their property to generate continuous paths, which is essential for modeling a heartbeat cycle as the motion of the heart is itself continuous. We illustrate the evolution of phase variability in multiple scenarios using different strengths for phase and frequency deviations in Figure 5.
Using the phase $ {\theta}^{\mathrm{sim}}(t) $ , we sample the simulated signals $ {g}^{\mathrm{sim}} $ and $ {f}_{\mathrm{c}}^{\mathrm{sim}} $ as defined in (5) and (6) to obtain simulated image pairs $ \left({\mathbf{g}}_c^{\mathrm{sim}}\left[:,:,n\right],{\mathbf{f}}_c^{\mathrm{sim}}\Big[:,:,n\Big]\right) $ . Thanks to the biologically inspired simulation models, these images share the relevant properties allowing us to use them to characterize the methods exposed in this work. A full simulated heartbeat is visible in Supplementary Video 1, where we also show the simulation of PAAQ imaging generated with our phase model.
5. Experiments and results
We characterized our method on both synthetic and real images. The following sections detail how we conducted these experiments, and discuss the obtained results. For fast computations, we implemented our gradient descent algorithm using JAX,(Reference Bradbury, Frostig, Hawkins, Johnson, Leary, Maclaurin, Necula, Paszke, Vander-Plas, Wanderman-Milne and Zhang 42 ) a high-performance numerical computing framework accelerated with parallel computing. When running experiments, we noticed that keeping the state of the Adam optimizer from one iteration of the phase estimation to the next instead of resetting its parameters greatly speeds up convergence.
5.1. Validation of the simulation model
To confirm the plausibility of our phase simulation model, we measured the heart rate variability in a high-speed dataset that was acquired for previous works.(Reference Mariani, Chan, Ernst, Mercader and Liebling 21 ) It consists in a 55 hpf wild-type zebrafish embryo imaged on a Leica DMR microscope equipped with a FASTCAM SA3 camera using brightfield at 1000 fps. We randomly isolated a subsequence of 4 seconds in the data, and identified its first period using image similarity. We assigned linear phases from 0 to 2 $ \pi $ to the frames of this first period, and then estimated the phase of all the other images in the subsequence as the phase of the most similar frame in the reference period.
We used these estimations to measure the phase deviation with respect to a perfectly periodic repetition of the first period. We repeated this measurement many times with different starting frames to build a relevant statistical representation of the experimental data. As shown in Figure 5a, the measurements closely match our model when using the values $ {\sigma}_{\theta }=0.15 $ and $ {\sigma}_{\omega }=0.04 $ . This corresponds to a relatively stable heart rate, as illustrated in the bottom-right panel of Figure 5b.
5.2. Characterization on synthetic data
We simulated the imaging of a heart with an average of $ 2.44 $ beats per second in two fluorescent channels ( $ C=2 $ ) using a camera with a frame rate of 20 frames per second (fps). We generated images of size $ 256\times 256 $ , and added Poisson noise to each pixel before quantizing their values to 12 bits. We used $ {\sigma}_{\omega }=0.04 $ and $ {\sigma}_{\theta }=0.15 $ to model the variability of the phase with comparable dynamics to what we measured on experimental data. We used a camera transfer time of $ 5\;\mathrm{ms} $ as the gap between reference and signal frames for PAAQ. We simulated imaging sequences of variable length, repeating each experiment with 10 different random seeds to generate confidence intervals.
We used $ L=50 $ for the phase estimation, and set the other parameters using the strategy described in Section 3.6. Before applying the method, we downsampled the images by a factor of 2 with spatial averaging, to reduce the computation time and noise.
As a baseline for performance comparison, we considered the ungated imaging of the fluorescent channels at low frame rate without using PAAQ, in addition to the acquisition of the brightfield images without illumination patterns to serve as registration reference. We sorted each channel separately for virtual high frame rate, and synchronized the obtained sequences using a temporal registration procedure based on maximizing the mutual information between the fluorescent channels and the brightfield sequence. This corresponds to a sequential approach to solving virtual high frame rate and multichannel registration, as discussed in Section 1. As the direction of the sequences is not given by the sorting, we applied the registration to all possible combinations of orientations for the channels, and kept only the solution with the maximum mutual information. We used a simple temporal upsampling of the sequences to allow for subframe precision registration.
Figure 6a compares how the phase estimation performance of the methods varies with the number of frames acquired. The naive sorting method refers to the solution given by using PAAQ imaging and estimating the phases with the uniform sampling assumption (Steps b and c in Figure 1, i.e., no phase correction algorithm). The mutual information-based method and the naive sorting obtain a similar performance over the whole range of experiments, which is expected as they are both limited by the uniform sampling assumption for phase estimation. Our method consistently improves on this performance, giving a relative 30% reduction of the phase error overall. Using (24), we computed that the error caused by heart rate variability is bounded to a maximum of 0.13% of one period on our simulated data. Therefore, the plotted phase error mostly characterizes the precision of our phase estimates, showing the benefit of using a nonuniform phase estimation algorithm.
We computed the initial average frequency of the simulated signal over the first two acquired periods using the method detailed in Section 3.4. The obtained relative error is directly tied to the error on phase estimation, dropping from 2% to below 1% when $ N\ge 50 $ . We repeated these experiments with increasing Poisson noise strength, with no noticeable performance impact on any of the methods.
5.3. Validation on high-speed experimental data
In order to confirm the applicability of our method on real data, we used the same high-speed dataset as in Section 5.1. To simulate slow PAAQ imaging, we randomly picked a frame as the start, then sampled the dataset every 100 frames to obtain the reference sequence, corresponding to a simulated camera frame rate of 20 fps. To emulate the paired signal frames, we sampled the data similarly starting five frames after the initial reference frame, corresponding to a simulated camera transfer time of $ 5\;\mathrm{ms} $ . We repeated this process with a different random starting frame to simulate the acquisition of a second channel. In this experiment, the signal sequences are all acquired using brightfield, but it does not affect the conclusions that we can draw on the precision of our phase estimation, as we use only the references for computation.
In order to generate the reference phases, we isolated a full period in the high-speed data starting from the first reference frame of the first channel, using image similarity to automatically find the start of the next period. We assigned linear phases from 0 to $ 2\pi $ to the frames in this initial period. We then estimated the ground-truth phase of all signal frames in all channels as the phase of the most similar frame (by image distance) in the reference period. We cannot know the real ground-truth phase of our samples, but this estimate is a sufficiently precise approximation given the high acquisition speed of this dataset. Indeed, since it is an order of magnitude higher than the frame rates we are simulating, the error on the ground-truth estimate has a negligible impact on our observations.
The high-speed dataset was acquired without the use of a motion-encoding illumination pattern, therefore it is missing in our simulated reference images. However, these data do not contain similar frames at different phases that would require motion disambiguation for sorting. Thus, the lack of patterned reference illumination does not impact the results of our method in this case, since it only affects the sorting of ambiguous frames. In a practical application, assessing the necessity of a motion-encoding pattern is only possible if high-speed reference images are available.
We simulated the slow imaging with varying sequence lengths, repeating each experiment 10 times with different random seeds to compute confidence intervals. For the phase estimation, we used $ L=50 $ and set the other hyperparameters according to Section 3.6, running 2 additional precision passes for best performance.
Figure 6b compares the phase estimation error obtained with naive sorting (see Section 5.2) and our method. Due to the absence of separate fluorescent channels in the data, we did not evaluate the mutual information approach. The performance of naive sorting is consistent with what we measured on the simulated dataset. Our method achieves even better performance here, obtaining a phase error below 1% of the period for all sequence lengths. This represents a significant relative error reduction of more than 50% for sequences with few acquired frames ( $ N\le 25 $ ).
The accuracy of the uniform phase sampling assumption increases with the number of acquired frames in Figure 6a, b. As the sampling becomes denser, the average distance between samples decreases, leading to a similar drop in phase error. However, even with a high number of frames, the sampling is nonuniform and our algorithm still provides a better accuracy.
Figure 6c illustrates the L-curve obtained for an experiment with $ N=50 $ . The highlighted point corresponds to the selected value of regularization strength $ \lambda $ used for the phase estimation. It lies close to the vertical axis, as the regularization corresponds to a strong prior on the data.
5.4. Results on low-frame rate experimental data
We performed experiments with zebrafish (Danio rerio) embryos from animals held at Institute of Anatomy (National Licence Number 35) from the University of Bern. Adult fish needed for breeding were raised and maintained at maximal 5 fish/L with the following environmental conditions: 27.5 to 28°C, with 14 h of light and 10 h of dark, 650–700 μs/cm, pH 7.5 and 10% of water exchange daily. Adult Tg(fli1a:GFP) y1T (Reference Lawson and Weinstein 43 ) and Tg(myl7:mRFP) ko08Tg (Reference Rohr, Bit-Avragim and Abdelilah-Seyfried 44 ) double transgenic zebrafish were crossed to obtain homozygous embryos. Eggs were collected within 30 min and kept in E3 medium (5 mM NaCl, 0.17 mM KCl, 0.33 mM CaCl2, 0.33 mM MgSO4) with Methylene Blue ( $ {10}^{-5} $ %). 30 min after egg collection, those that did not transition to two-cell stage were removed. The next day, at 24 h after removing the screens, chorions were removed by incubating 2 mg/mL Pronase in E3 medium (about 3 min) until gentle shaking of the Petri dish freed the larvae from the chorion. Subsequently, embryos of the same developmental stage were selected and the most frequent stage was estimated.(Reference Kimmel, Ballard, Kimmel, Ullmann and Schilling 45 ) Before 24 hpf, we added 0.003% 1-phenyl-2-thiourea (PTU, Sigma-Aldrich) to avoid pigmentation.
We anesthetized the embryos with Tricaine at 0.08 mg/mL, pH 7 and embedded the embryos with the anterior side (head) up in a fluorinated ethylene propylene tube in 1% low-melting agarose. We imaged the sample on an implementation of the OpenSPIM microscope(Reference Pitrone, Schindelin, Stuyvenberg, Preibisch, Weber, Eliceiri, Huisken and Tomancak 46 ) with two lasers (Vortran Stradus, 488 and 561 nm), a UMPlanFL N 20 $ \times $ /0.50 W semi-apochromat water dipping objective lens (Olympus), and a sCMOS camera (Andor Zyla 4.2) mounted on a U-TV 0.5 $ \times $ C-3 adapter (Olympus).
We designed a low-cost hardware controller to generate modulated illumination patterns in four different channels based on an Arduino microcontroller. We used standard BNC connexions to make it compatible with most imaging and illumination hardware found on the market. We implemented drivers and plugins to integrate it into the $ \mu \mathrm{Manager} $ control software(Reference Edelstein, Amodaj, Hoover, Vale and Stuurman 47 ) and provide a user-friendly interface to use the device. Supplementary Figure 1 shows the controller and its schematics. We open-sourced our custom hardware and software at https://github.com/idiap/CBI-MMTools. This provides a low-cost and easy way to implement our method on devices that cannot easily generate the required temporal illumination patterns.
We used this controller to generate the PAAQ illumination patterns depicted in Figure 2. We acquired images with a frame rate of 14.42 Hz, corresponding to a 60 ms exposure time $ {\Delta}_E $ and a camera transfer time of $ 10\;\mathrm{ms} $ . We set a pulse width $ {\Delta}_P $ of $ 10\;\mathrm{ms} $ for the signal illumination, and a linear ramp containing three pulses of $ 10\;\mathrm{ms} $ separated by $ 15\;\mathrm{ms} $ gaps for the reference pattern. We used $ L=50 $ for the phase estimation, and used Section 3.6 to set the other hyperparameters, running one additional precision pass for performance.
Our first experiment compares the performance of our algorithm to the mutual information-based method when processing two fluorescent channels with varying number of images. We extract a region of size $ 512\times 512 $ centered on the heart and downsample the image by a factor 2 with spatial averaging before using the algorithms. Figure 7a shows a frame in each of the reconstructed videos for $ N=100 $ and $ N=20 $ , corresponding to a single estimated phase in the signal. With a high number of frames, the two methods give visually similar results, however, with fewer frames the mutual information video contains artifacts in the synchronization of the channels. This is highlighted at the cross mark, where the myocardium (red) appears to intersect with the endocardium (green), which is anatomically implausible. With the same number of frames, our method does not generate this artifact, and stays consistent with the solutions computed with more frames. This illustrates the better robustness of our algorithm with respect to the amount of images available. Moreover, the solution obtained using mutual information contains anatomical artifacts even when using a high number of frames, as visible in Supplementary Video 2. In the same situation, our method generates a video that does not contain visible anatomical artifacts, benefiting from the better phase estimates.
Our second experiment compares the performance of the two methods when the amount of information available is limited. We achieve this by restricting the field of view to only a smaller part of the heart in a region of interest of size $ 256\times 256 $ . We also downsample by a factor 2 before applying both methods. For this experiment, we acquired $ N=50 $ images in $ C=3 $ channels, treating the brightfield modality as an additional signal channel. As shown in Figure 7b, the mutual information reconstruction fails to correctly synchronize the red and green channels, even though the synchronization between red fluorescence and brightfield is correct. In the highlighted area, the myocardium (red) and endocardium (green) visibly intersect in an impossible way in the solution given by mutual information, while in the corresponding frame obtained with our method the channels are correctly synchronized. The incorrect solution obtained with mutual information is visible in Supplementary Video 3, where the green channel follows the movement of the red one with an obvious delay. In the smaller image, the amount of mutual information between channels is lower, partly because there are not enough relevant data available to build a meaningful statistical representation of the sequences. Our method does not require mutual information between the channels to give a correct solution, making it suitable to work on data with limited size but also in cases where the different channels contain entirely uncorrelated signals.
Supplementary Video 4 illustrates the whole reconstruction process, showing the acquired images and the virtual high frame rate multichannel video reconstructed using our method. It corresponds to the top-right frame in Figure 7a.
6. Discussion
Our method is able to generate multichannel videos with a virtual high frame rate with high temporal precision. As derived in Section 3.5, the channel synchronization error caused by heart rate variability is bounded and depends only on the transfer time of the imaging device. Therefore, the overall accuracy of our method mainly depends on the total number of images acquired, which can be chosen as high as needed to achieve a desired temporal resolution. The frame rate of the acquisition device has no direct impact on the precision of the reconstruction. Nevertheless, a faster frame rate will speed up the overall imaging time. In practice, imaging over a shorter period means that the heartbeat is more likely to remain regular.
Although we used a sensitive camera with low noise to obtain the experimental results presented in Section 5.4, we expect our method to perform similarly well on more noisy images. Indeed, the spatial averaging and downsampling step applied before sorting reduces the strength of noise present in the images used for phase estimation. This is supported by the results in Section 5.2, where we observed that increasing the noise in our simulations did not noticeably impact the performance of the algorithm.
Prospective optical gating methods(Reference Taylor, Girkin and Love 19 , Reference Taylor, Saunter, Love, Girkin, Henderson and Chaudhry 20 ) provide a high accuracy, with a better control on when the cardiac cycle is sampled. They can also be used for 3D imaging, by using the same trigger to synchronize videos at different depths. However, our technique has lower hardware requirements, making it more cost-effective and easier to integrate into existing imaging platforms. Our method is currently limited to 2D imaging, although it might be generalized by implementing depth sweeping during imaging, combined with a method that removes the induced scan aberrations.(Reference Mariani, Ernst, Mercader and Liebling 22 ) Another possibility would be to use an imaging modality with a large depth of field for the reference frames, allowing to accurately synchronize neighboring sections of the volume through a temporal registration step.
Methods that do not use PAAQ imaging (such as the mutual information synchronization used for comparison) have no hardware requirements at all. Nevertheless, we have shown that our technique is more robust to the size of the data available, both in terms of amount of images and their size. At the cost of a more complex acquisition procedure requiring illumination switching, it guarantees a better temporal resolution. Therefore, our method offers a compromise, giving high accuracy while still keeping the hardware cost and complexity low.
The virtual high frame rate reconstructions obtained with our method can still contain artifacts where the sample is moving aperiodically, or according to a period different from the cardiac cycle. This is visible in Supplementary Video 4, where a trabecula at the top of the reconstructed video appears to have an aperiodic motion. This limitation is inherent to slow frame rate imaging and only a fast acquisition device would be suitable to capture such aperiodic phenomena accurately.
Similarly, registration between channels is not possible in regions of the sample if they have an aperiodic motion and channels would need to be acquired simultaneously. In a scenario requiring exact registration in all regions despite local aperiodicity, one could use a hybrid method using beam-splitting optics for simultaneous acquisition of the channels, paired with our phase estimation to enhance the frame rate. This would however not solve the limitation expressed above regarding temporal resolution in aperiodically moving regions.
Although we only described the use of fluorescence imaging paired with a brightfield reference, PAAQ could exploit other combination of imaging modalities, provided that it is possible to switch quickly enough between them. The reference signal could be any modality, but one that induces minimal phototoxicity, like brithgfield, is clearly preferable.
7. Conclusion
We have introduced a method to reconstruct virtual high frame rate multichannel videos from slow single-channel acquisition of periodic processes. We have validated our method on experimental data, and shown that it does not contain the anatomical artifacts visible when using a state-of-the-art technique based on mutual information.
PAAQ is based on alternating acquisitions between a common reference modality and fluorescence channels, allowing to reconstruct synchronized videos of multiple modalities without requiring mutual information between channels. We have shown that the synchronization error caused by variability in the signal periodicity is small, even with big frequency variations if the transfer time of the imaging device is small enough.
We have proposed an image-only phase estimation algorithm that improves on existing sorting-based virtual high frame rate methods thanks to a nonuniform sampling assumption. We have shown on both synthetic and experimental data that our algorithm brings an overall relative reduction of the phase error by 30% compared to that of techniques that use a uniform sampling assumption. For synthetic experiments, we designed a simulation framework that reproduces the most relevant properties of cardiac fluorescence microscopy, and can model the heart rate variability with statistical properties matching experimental observations. We have proposed a reproducible way to compute the optimal hyperparameters for our method that guarantees good performance without manual tuning.
We released the schematics and drivers for a hardware controller that can modulate light sources and synchronize active illumination with imaging devices. Similarly, we released the simulation framework and the implementation of our algorithms.
Although we developed our method for the specific case of cardiac microscopy, our algorithm is generic and we foresee that it could be applied in other scenarios involving the imaging of a cyclic phenomenon with a slow device.
Supplementary material
The supplementary material for this article can be found at https://doi.org/10.1017/S2633903X23000223.
Data availability statement
The data used in this article are publicly available: https://www.idiap.ch/en/dataset/paaq-heart. Replication code can be found in the Idiap CBI Toolbox library: https://github.com/idiap/cbi_toolbox.
Author contribution
Conceptualization and methodology: F.M., M.L.; Data curation, investigation, software, and visualization: F.M.; Resources: A.E., N.M.; Supervision: M.L.; Writing: F.M., A.E., N.M., M.L. All authors approved the final submitted draft.
Funding statement
This work has been conducted with the support of the Swiss National Science Foundation under grant numbers 200020_179217: Computational biomicroscopy: advanced image processing methods to quantify live biological systems; and 206021_164022: Platform for reproducible acquisition, processing, and sharing of dynamic, multi-modal data. N.M. was funded through Swiss National Science Foundation grant 310030L_182575.
Competing interest
The authors declare no competing interests associated with this manuscript.
Ethical standard
Zebrafish breeding and procedures were carried out within the scope of an institutional protocol approved by the Veterinary Office of the Canton of Bern, Switzerland (fluorescent) and the Institutional Animal Care and Use Committee at the University of California, Santa Barbara (wild type).
A. Appendix. Constrained weighted linear least squares expression for distance curve fitting
In Section 3.3.1, we compute the parameters $ {\tilde{\mathbf{d}}}_n $ of the estimated distance curves by solving a linear least squares problem. It is expressed as follows:
where the matrices are defined as:
with $ i=0,\dots, {N}_g $ , $ j=0,\dots, {N}_g $ , and $ l=0,\dots, L-2 $ , and
The quadratic programming problem is then:
with the monotonicity constraint matrix:
of size $ \left(L\times L-1\right) $ , where the change of sign on the diagonal (highlighted) happens after line $ {L}_n^{\mathrm{max}} $ .
B. Appendix. Derivation of the upper bound for the heart rate variability error
In Section 3.5, we analyze the phase estimation error caused by heart rate variability using its upper bound. To derive it, we express the phase delay between a reference frame and its associated signal frame as the product of the time delay between the images $ {\Delta}_C $ and the instantaneous frequency of the heartbeat during that delay $ {\omega}_{c,n} $ . We use $ {\unicode{x025B}}_{c,n} $ to describe the deviation between this instantaneous frequency and its average: $ {\omega}_{c,n}=\overline{\omega}+{\unicode{x025B}}_{c,n} $ . In order to measure only the error caused by heart rate variability, we imagine that our phase estimation algorithm finds the exact phases of each reference frame: $ {\tilde{\theta}}_{c,n}={\theta}_{c,n}^g $ .
We can then compute the error $ {\mathcal{E}}_{\unicode{x025B}} $ that results from the variability in the heart rate using (2):
From the definition of the phase operator in (3), it stems that $ \left|\alpha \hskip0.35em \ominus \hskip0.35em \beta \right|\le \left|\alpha -\beta \right| $ , and we can find an upper bound for our error:
Given that $ {\unicode{x025B}}_{c,n} $ is sampled from a distribution with a zero mean, the right term is minimized when $ \Theta =\overline{\omega}{\Delta}_C $ , leaving the final expression of the upper bound:
C. Appendix. Hyperparameters selection method
In Section 3.6, we highlighted that selecting optimal values for the hyperparameters is crucial to guarantee the performance of the phase estimation algorithm.
The width of the Gaussian window $ \sigma $ used in (17) and (18) influences precision significantly. Indeed, a window too large will decrease precision due to too much averaging, while a window too narrow will be too sensitive to noise due to not including enough measurements. To find good value for $ \sigma $ , we plot the residual error obtained when minimizing (17) with the initial phase estimates, using different window widths up to $ 2\pi $ . We select the last local minimum in that curve as the best value for $ \sigma $ . This represents the best fitting performance, ignoring the minima obtained with very small $ \sigma $ that correspond to overfitting the noise in the data.
The number of points $ L $ used to represent the distance curves has an impact on both accuracy and performance. The precision of the method tends to increase with $ L $ , as the curve approximation gets finer, but the convergence time of the method increases as well due to the added complexity. Generally speaking, we have found that our algorithm fails to converge when $ L $ is too big (typically bigger than $ 2{N}_g $ ), and that using more than $ L=200 $ does not seem to improve performance further.
The update step size of the gradient descent optimizer does not impact the precision much (thanks to the adaptive scaling of the Adam optimizer), but choosing a good value can speed convergence up. We have found that a default value of $ {10}^{-3} $ works well in all scenarios.