I. Introduction
Recently, content-aware image resizing (CAIR) techniques, also known as image retargeting, have been attracting the attention of computer vision researchers [Reference Avidan and Shamir1–Reference Domingues, Alahi and Vandergheynst6]. These techniques are regarded as sophisticated image resizing methods. To resize an image into a different aspect ratio and/or one with complex structures, CAIR gives better results than traditional scaling and cropping. In image processing terms, CAIR is a non-uniform resampling of images for extracting region-of-interest (ROI) automatically. However, it generally requires high computational complexity to analyze local and/or global image structures such as objects, backgrounds, and saliency. Therefore, CAIR is a very challenging task for the built-in software in portable devices, such as cell phones and PDAs.
Seam carving (SC) [Reference Avidan and Shamir1, Reference Rubinstein, Shamir and Avidan2] is one of the pioneering techniques for CAIR. It determines a seam, defined as an eight-connected path of pixels from the top (left) to the bottom (right) of the original image shown in Fig. 1(b). The width (height) of the image is reduced by one pixel after pruning a seam. The final retargeted image is produced with iterative pruning of seams until the retargeted image reaches the desired width (height). An example of the retargeted image is shown in Fig. 1(d). Combinations of SC and other resizing methods (scaling and/or cropping) have been reported in [Reference Dong, Zhou, Paul and Zhang7, Reference Rubinstein, Shamir and Avidan8].
There are a few approaches of content-aware image coding (CAIC) with SC-like methods [Reference Nguyen, Yang and Cai9–Reference Tanaka, Hasegawa and Kato11]. Generally, they first perform CAIR for the original image and then the retargeted imageFootnote 1 is compressed by some image coding standards. The side information, which would consist of seam path information (SPI) and pixel values of seams, will be stored and/or sent with the compressed image data. At the decoder side, the compressed core image is first decoded. If one needs the reconstructed image with arbitrary resolutions, the side information is used to reconstruct regions outside the core image (non-ROI).
For these methods, the amount of SPI significantly affects the entire bitrates. Although the conventional method [Reference Nguyen, Yang and Cai9] uses arithmetic coding to encode SPI, it usually requires high bitrates. The methods in [Reference Tanaka, Hasegawa and Kato10, Reference Tanaka, Hasegawa and Kato11] try to reduce the bitrates to employ “pillar”-style seams. Unfortunately, the structures of their seam paths are restricted.
In this paper, we present a novel rate-dependent SC (RD-SC). Intuitively, seam paths by RD-SC are the lossy version of the original seam paths and it is realized by a seam path approximation with piecewise predefined functions (Fig. 1(c)) while maintaining CAIR performance (Fig. 1(e)). Furthermore, we integrate RD-SC into image coding to realize an effective CAIC. A new parent–children relationship for a well-known wavelet-based image coding algorithm, set partitioning in hierarchical trees (SPIHT) [Reference Said and Pearlman12], is proposed. In the experimental results, its image coding performances at full resolution are comparable to those of the regular SPIHT while preserving good CAIR qualities for retargeted images.
This paper is organized as follows: Section II gives a brief reviews of related methods of RD-SC. The following two sections present the main contributions of this paper. Section III discusses RD-SC using a piecewise approximation. A wavelet-based CAIC with the modified SPIHT algorithm is presented in Section IV. The experimental results are shown in Section V, and finally, Section VI summarizes the paper.
Notation
Table 1 lists the symbols used in this paper for an image I. To simplify our description, we use the MATLAB-based notation for the sets of rows or columns in a matrix. Moreover, for simplicity, the ith element of a vector x is represented as x(i) and a subvector of x between L 0 ≤ i ≤ L 1 is defined as x(L 0:L 1).
In addition, this paper considers a vertical seam for the discussion about RD-SC unless specified. A vertical seam is defined as an eight-connected path of pixels of one-pixel width from the top to the bottom of an image. The process can be easily extended to a horizontal seam.
II Review
A) Seam carving
SC [Reference Avidan and Shamir1, Reference Rubinstein, Shamir and Avidan2] prunes seams classified as insignificant in an image. In the original SC technique, a dynamic programming approach is used to decide seams which can be pruned. A cumulative cost M(i, j) for each seam is calculated by the following equation:
where
The cost function is called forward energy [Reference Rubinstein, Shamir and Avidan2]. Finally, the seam with the smallest M(H I, j) is pruned. With this criterion, a retargeted image is forced to avoid the large transition of pixel values between the left and right pixels of the pruned seam. The process is iterated until the desired image width is reached. For other details, please refer to [Reference Avidan and Shamir1,Reference Rubinstein, Shamir and Avidan2].
B) CAIC
So far, several methods were reported in CAIC [Reference Nguyen, Yang and Cai9–Reference Tanaka, Hasegawa and Kato11, Reference Décombas, Capman, Renan, Dufaux and Pesquet-Popescu13–Reference Deng, Lin and Cai15]. In this subsection, we briefly review the related works.
Image coding with selective data pruning (SDP) [Reference Vō, Solé, Yin, Gomila and Nguyen16] is classified as one of the “decimation-then-compression” image coding approach. The SDP method has a very simple architecture for reducing an image size since it is based on line-based downsampling. That is, rows and/or columns in an image are simply pruned and then the pruned image is interpolated on the synthesis side. The pruned lines are located predominantly across low-frequency regions in the image and thus the interpolated image has few artifacts if a small number of lines are pruned. In the comparison of image coding without data pruning, SDP-based image compression improves the peak signal-to-noise ratio (PSNR) at low bitrates. However, when many lines are pruned, the pruned lines most likely cut across salient regions. As a result, the interpolated image could have artifacts. A similar method to SDP was also presented in [Reference Wang and Urahama17].
Incorporating SC into image coding was investigated by Nguyen et al. [Reference Nguyen, Yang and Cai9]. In their method, SPI and the pixel values of seams as well as the core image are encoded. The method can reconstruct the original image losslessly if the core image, the SPI and the pixel values of seams are losslessly encoded and transmitted. However, it demands a large amount of bitrates for SPI, and the overhead bitrate significantly affects the entire image coding performance.
The authors proposed an approach for CAIC called image concentration and dilution [Reference Tanaka, Hasegawa and Kato10,Reference Tanaka, Hasegawa and Kato11], denoted as CD-based image coding hereafter. In the method, first, insignificant regions in the original image are discarded by the SC-like method and then the core image is compressed by some image coding standard. At the same time, the discarded pixel positions are stored. On the decoder side, the core image and the side information are synthesized to reconstruct the image with an arbitrary resolution. The discarded pixels, i.e., non-ROI pixels, are interpolated from the decoded core image. Unfortunately, reconstructed images after CD-based image coding sometimes have the interpolation artifacts at non-ROI.
The SC-based content-aware video coding was proposed in [Reference Décombas, Capman, Renan, Dufaux and Pesquet-Popescu13]. It reduces side information bitrate, i.e., for SPI, by employing key points of seam paths and a seam merging process. In low bitrate video coding, the method succeeded to reduce total bitrates for the equivalent video quality. However, it is also non-perfect-reconstruction video coding as well as SDP- and CD-based image coding. That is, seams should be interpolated from reconstructed core video frames.
Frequency-domain SC and its image coding application were reported in [Reference Deng, Lin and Cai14, Reference Deng, Lin and Cai15]. It calculates optimal seam paths on the LL subband of DWT. The SPI is then used to determine the scanning order of SPIHT. The seams are scanned and transmitted in the energy descending order (high-energy seams are firstly encoded). In full resolution, seams in the LL subband result block-based seams and therefore the decomposition level should not be too large (practically, the decomposition level will be less than or equal to 3 [Reference Deng, Lin and Cai15]). Therefore, its image coding performance at full resolution is substantially inferior to the original SPIHT.Footnote 2
Briefly speaking, the following two main problems should be resolved for CAIC:
(1) Reducing side information for CAIR while preserving CAIR performance.
(2) Maintaining image coding performance at full resolution.
For both encoding and decoding, the side information, especially SPI, is usually required to obtain a retargeted image with an arbitrary resolution. However, the original SC used in [Reference Nguyen, Yang and Cai9] is not designed to encode SPI effectively. Therefore, the amount of side information is quite large if we do not consider designing a SPI encoding method.
Another problem is the image coding performance. Although the core image can be compressed effectively by any existing standards, non-ROI is difficult to compress with a good quality since the positions of those pixels are not structured and highly depend on images.
In the following sections, we introduce the RD-SC and the modified tree structure of SPIHT, which try to resolve these problems.
III. RD-SC with Piecewise Approximation
In this section, we present details of RD-SC. The seam paths of RD-SC are constructed from those of the original SC based on a piecewise approximation. The RD-SC structure introduced in this section is illustrated in Fig. 2 and it is briefly summarized as follows:
(1) Calculate a seam by the original SC. If there exists already pruned seams, a modified cost map calculated from previous pruned seam paths is used (Section III-F).
(2) Approximate the seam with the consideration of a rate-distortion performance (Section III-A–III-E).
(3) If possible, prune multiple seams (Section III-G).
(4) Store SPI and return to (1) for the next seam approximation.
The rest of this section describes detailed algorithms.
A) Piecewise function
Let s be a seam path. In general, it can be defined as follows:
where f(i) is a function to determine a seam path. A specific coordinate of the seam path is represented as (i, s(i)). According to the purpose of SC [Reference Avidan and Shamir1, Reference Rubinstein, Shamir and Avidan2], the following conditions are usually imposed on f(i):
• Connectivity: |f(i) − f(i + 1)| ≤ 1.
• Rectangular restriction: 0 ≤ f(i) ≤ W I − 1.
The first condition is the requirement that a seam path should be a set of eight-connected pixels. The second one is imposed to preserve a rectangular shape of the resized image.
Usually s has a complex structure since SC calculates s pixel-by-pixel and thus, it is difficult to compress efficiently. In this paper, we propose an approximation method of s with piecewise predefined functions. We define an approximated seam path a as follows:
where ak (k = 0, 1, …) is the kth portion of a, L(k) defines the interval for ak and f k(i k) is an approximation function. ak is calculated within the interval $\left[\sum_{p=0}^{k-1} L\lpar p\rpar \comma \; \sum_{p=0}^{k} L\lpar p\rpar -1\right]$. The relationship between s and a is illustrated in Fig. 3.
B) Candidates of f k(i k)
Any functions can be candidates of f k(i k) as long as they follow the conditions mentioned in the previous subsection. We consider to reduce the bitrate for SPI by choosing one function from several templates. A template set of f k(i k), shown as f k, g(i k), is heuristically defined as follows:
where C k = a k(0) and [·] is the rounding operator. f k, g(i k) are shown in Fig. 4. Clearly, (4) is the constant function. The upper and lower bounds of seam paths from the connectivity condition are represented in (5). In (8), the minimum operator is used since the square-root functions violate the connectivity condition without the operator.
We append one extra condition as follows:
where so is the original seam path to be approximated. This means the initial position of ak is the same as the terminated position of ak−1. With this restriction, we can reduce the connection information between ak−1 and ak.
As a result, the SPI for ak consists of a selected index among (5–8) and the interval L(k). Furthermore, the overall initial position s o(0) = a(0) is required for each a.
C) Recursive optimization of approximated seams
The optimal L(k) should not be determined as a constant. For some k, L(k) can be large if so goes through very smooth regions or it is well approximated by one of the templates (4–8). Whereas L(k) should be small at the textured region where so has a complex structure. Hence, we present the algorithm for recursive optimization of L(k).
We assume so is calculated before the approximation. To estimate L(k) effectively, we define the simplified forward energy criterion as follows:
where s L is the length of s. This criterion is similar to the forward energy of the improved SC [Reference Rubinstein, Shamir and Avidan2] but nevertheless is less complex. In the optimization process, we use (10) since many repetitive calculations are required.
Algorithm 1 presents the detailed optimization process of L(k) for the kth interval of I:
In Algorithm 1, so, sub is a original sub-seam path for I sub, T > 0 is a threshold and asub is the approximated seam path for I sub. Moreover,
It is clear that the approximated seam path ak, g can be completely recovered from the returned parameters. The simplified structure of this algorithm is shown in Fig. 5. In applications shown in this paper, the maximum row height of I sub (denoted as L max) is limited to 32. As a result, each L(k) is selected from {32, 16, 8, 4}.
D) Iterative optimization
In Algorithm 1, the optimal ak associated with L(k) is obtained for a given threshold T. As a further extension, the optimal threshold T is sought to control the bitrate for the SPI and the CAIR quality. The optimization is realized by an iterative optimization of Algorithm 1. The object of RD-SC is to find the nearest seam aopt to so under a bitrate constraint for the SPI, which is represented as follows:
where λ ≥ 0 is the Lagrange multiplier. In this approach, we define D PA(a), shown below, based on the reconstruction error by the simple interpolation.
where MSE is a mean squared error between the original and interpolated seams and Ī is defined by a linear interpolation from six available neighbors as
which is depicted in Fig. 6. The calculation method of R PA (a) is shown in the following subsection (Section III-E).
Finally, aopt in (11) is sought by gradually changing T shown in Algorithm 2, where ${\bi a}_{N_{cur}}$ is the approximated seam path at the current iteration and T PA contains N T threshold values and they are monotonically increasing. In this paper, we use TPA = [1, 5, 10, …, 50].
It is a similar strategy to the rate-distortion optimization for video coding [Reference Ortega and Ramchandran18]. Instead of the brute-force optimization, our approach is climbing up the R-D slope as shown in Fig. 7 to reduce computational complexity.
E) Bitrate calculation
In Algorithm 2, R PA (a) should be properly calculated from the approximated seam a. As previously mentioned, the function index g opt in Algorithm 1 and the set of L(k) are the side information for SPI. Since Algorithm 1 is based on the division of I sub into two halves, we can represent a set of L(k) with a simple binary representation. That is, 0 is stored if I sub is not divided, whereas 1 is stored if I sub is divided. As a result, a binary vector bdiv can be obtained for the division information, which corresponds to L(k).
By our preliminary experiment, codewords for function indices g opt are determined as shown in Table 2. The required bitrate for the approximation functions is calculated based on the assigned codewords.
Finally, R PA (a) can be represented as follows:
where #(bdiv) is the number of elements in bdiv and $b_{g_{opt}}$ is the number of bits to encode g opt.
F) Weighted cost matrix
Each rate-dependent seam path can be calculated by RD-SC mentioned previously. However, a reconstructed image sometimes presents visible artifacts due to seam pruning from almost the same spatial positions. To alleviate this problem, we consider balancing the positions of the seams. For this purpose, we modify the cost of the original SC M(i, j) in (1). When an already pruned seam is defined as $\hat{\bi a}_{m} \lpar m = 0\comma \; 1\comma \; \ldots\rpar $, a new cost map is represented as follows:
where w and δi, j are the weight and the Kronecker delta, respectively. Briefly speaking, the extra cost map M b(i, j) appends w for the conventional cumulative cost M(i, j) when a seam candidate crosses (or touches) a pruned seam path. We use ${\hat M}\lpar i, j \rpar$ instead of M(i, j) in (1) for calculating a seam path. As a result, the updated cost ${\hat M}\lpar i, j \rpar$ avoids pruning a seam adjacent to already pruned seams.
Furthermore, ${\hat M}\lpar i, j \rpar$ is utilized to determine the maximum number of seams. If $\min\lpar {\hat M}\lpar H_{I}\comma \; j\rpar \rpar \gt \tau$, where τ is an arbitrary threshold, we stop pruning seams since the remaining retargeted image is sufficiently condensed and no more seams should be pruned from the image.
G) Multiple seam pruning
In the original SC, just one seam is pruned at each seam calculation. Hence, pruning many seams increases the computational complexity. Therefore, we reuse a primal seam path for pruning the seams around it. In this paper, we use the seam-by-seam calculation for so to clarify our contribution. To further reduce the computational complexity, we could use an improved SC method, such as [Reference Chiang, Wang, Chen and Lai19], which uses one saliency map for pruning several seams. It is able to speed up the calculation of so.
We present an algorithm to prune multiple seams with a primal seam path ap in Algorithm 3. In the algorithm, ${\bi B} = \lcub {\bi b}_{-\lpar N_{m} - 1\rpar /2}\comma \; \ldots\comma \; {\bi b}_{0}\comma \; \ldots\comma \; {\bi b}_{\lpar N_{m}-1\rpar /2}\rcub $ is a set of seams to be pruned at the same time, N m is the (odd) number of pruned seams and T m is the threshold to determine whether multiple seams can be pruned.
IV. Wavelet-Based CAIC
In this section, the wavelet-based multiresolution representation of images for RD-SC is presented. Furthermore, the modified SPIHT algorithm is shown.
A) Lifting-based seamlet
For preserving the image coding performance at full resolution, the pixel values of seams should be encoded as well as the core image. Before constructing a full multiresolution image, seams are transformed by the lifting-based discrete wavelet transform (DWT).
The wavelet transform along seam paths, named seamlet, was proposed by Conger et al. [Reference Conger, Radha and Kumar20]. It generates coarse and detail coefficients similar to the conventional DWT, but its downsampling positions are determined by seam paths. Though it is able to use any wavelet filters, the original seamlet recommended Haar wavelet. However, we focus on the image coding application using CAIR. The detail coefficients should be as small as possible for image coding. Thus, lifting-based 5/3 DWT is employed for seamlet in this paper. The detail coefficient is calculated as follows:
where d(i, a(i)) is the detail coefficient at (i, a(i)). This lifting-based transformation is shown in Fig. 8. In our approach, the coarse coefficients remain unchanged different from the original seamlet. That is, the five-tap lowpass DWT filtering is not performed, since these unfiltered original pixels in the core image are finally transformed by the conventional DWT. Finally, d(i, a(i))'s are collected and one column of detail coefficients is yielded. The process is recursively performed for the number of seams.
In this paper, first vertical seams are calculated and transformed by the lifting-based seamlet and then the horizontal seams are calculated and transformed, shown in Fig. 9. Finally, the core image is transformed by the multilevel DWT to construct a fully decomposed multiresolution image. Note that this can be easily extended to the optimal order seam pruning [Reference Rubinstein, Shamir and Avidan8] for 2-D CAIR; however, we choose this simple 2-D CAIR because of the computational complexity of the optimization of the seam pruning sequence. A retargeted image with an arbitrary resolution is easily obtained by interpolating seam paths.
B) Pre-quantization of detail seam coefficients
It is clear that pixel values of the seams are not as important as the core image since prominent regions are preserved in the core image through RD-SC. Therefore, we consider to quantize detail seam coefficients before SPIHT encoding. It is represented as follows:
where d q(i, a(i)) is the pre-quantized detail seam coefficient, f q is the rounding control parameter, and Δ is the quantization step size. Clearly, no quantization is performed when Δ = 1. The pre-quantized signal is reconstructed by the following equation:
This pre-quantization is based on the scalar quantization with offset used in H.264/AVC [Reference Wedi and Wittman21] and we use f q = Δ/3 similar to the intra-coding of H.264/AVC. By using this pre-quantization, we can control the qualities between the core image and seams.
Modified tree structure of SPIHT for CAIC
For the proposed multiresolution image representation, we introduce a new parent–children relationship between the transformed coefficients in the core image and detail coefficients of seams for SPIHT encoding [Reference Said and Pearlman12].
The ordinary parent–children relationship in [Reference Said and Pearlman12] is defined to connect a pixel to four pixels in the same subband for the next lower decomposition (higher frequency) level. In this paper, the relationship is used for transformed coefficients of the core image. Let O(q, r) be a set of children indices of the parent index (q, r). The relationship is defined as
Besides the ordinary relationship, we introduce the new relationship between the core image and seams. We define (q c, r c) as the index of a transformed coefficient of the core image after DWT in the HL subband for the lowest decomposition (the highest frequency) level, corresponding to I(i, a(i) − 1). Furthermore, (q s, r s) is defined as the index of a detail coefficient of a seam corresponding to d q (i, a(i)).
The new parent–children relationship is represented as follows:
The new parent–children relationship is illustrated in Fig. 10. This relationship means that detail coefficients of seams are related to the nearest core pixels. The case of horizontal seams is similarly defined as well.
V. Experimental results
In this section, we validate image coding and CAIR performances. Four eight-bit grayscale images Park Joy, Pisa Tower, Beach and Crew are used as test images. Their sizes are summarized in Table 3 and the original images are shown in Fig. 11. All core image sizes are determined automatically based on the parameter τ introduced in Section III-F. Experimentally set parameters are summarized in Table 4.
A) Effects of parameter setup
The theoretical optimization of parameters shown in Table 4 is quite difficult since the optimization process could be highly non-linear and the best parameter set is certainly depending on the one who views the retargeted and full resolution images. Therefore, all of them are set heuristically as previously mentioned. Here, we describe the effect of parameter settings. In the following experiments in this subsection, one parameter is changed and the remaining ones are the same as Table 4 unless specified. Since effects of a few parameters are relatively clear, i.e., those of L max and τ are analogous, we describe about λ, ω, T m, and N m. It is an interesting future work to determine optimal parameters automatically. As a simple approach, we could append a parameter tuning step by a user with some specific test images.
1) λ
It controls bitrates for SPI and the accuracy of seam paths. In the experiment, W I/4 vertical seams and H I/4 horizontal seams are calculated regardless of τ. The SPI of SC is encoded by arithmetic coding similar to [Reference Nguyen, Yang and Cai9]. Figure 12 is a plot of the bitrate of SPI as a function of λ. As expected, the bitrate decreases as λ increases and the bitrates are much lower than those of SC.
Figure 13 shows the calculated seams of Pisa Tower for various λ. The seams represent different paths depending on λ. The seams move finely like SC for a small λ, whereas for a large λ, the seam paths are a coarse approximation of the original ones.
2) ω
It is used for balancing seam positions. ω = 0 means there are no restrictions for the positions. Figure 14 shows approximated seam paths of Crew for various ω. The seam paths are significantly changed within the range [0, Reference Guo, Liu, Shi, Zhou and Gleicher5]. Without the restriction, i.e., ω = 0, seams are mainly located on the smooth areas and the positions are concentrated similar to the original SC. Whereas for the large ω, such as ω = 5.0 or 10, the seams are distributed uniformly. Seams for the large ω are sometimes passing through the prominent regions of the image, e.g., faces of astronauts.
For the retargeting purpose, ω should be small; which yields similar seam positions to those of the original SC (Fig. 14(a)). Whereas for the image coding purpose, seam paths should be distributed uniformly. As a simple comparison, Table 5 presents PSNRs of the interpolated seams by linear interpolation. It is clear that ω = 0, i.e., the most concentrated case, shows the worst interpolation result since the small ω requires interpolating wide areas from few remaining pixels. It is also observed that ω should not be too large. Therefore, as a trade off issue, we picked up ω = 0.5.
3) N m and T m
N m determines the maximum number of seams removed at the same time. The large N m permits removing many seams by one seam path calculation step. The seam paths for N m = 3 and 11 are shown in Fig. 15. As expected, many seams share one seam paths in Fig. 15(b). PSNRs of interpolated seams by linear interpolation and SPI bitrates are compared in Fig. 16. It seems that the large N m is recommended anytime. However, as shown in Fig. 17, the large N m significantly affects the retargeted image quality. In the figure, the straight line in the original Crew image was broken in the retargeted image with N m = 11. Hence, we picked up N m = 3, which seems a good trade-off as shown in Fig. 16.
T m has a similar effect to N m, since it decides whether N m seams can be pruned or not (see Algorithm 3). If N m > 2552 (possible upper bound of MSE), the algorithm always choose to prune N m seams regardless of MSE. Based on the similar experiment about T m, we experimentally set T m = 4. The approximated seams for a few T m are shown in Fig. 18.
B) CAIR performance
The second comparison is for CAIR performance. As far as we know, few numerical measures exist to compare CAIR performances. Hence, we present all of the retargeted images for visual quality comparison. Table 3 summarizes the sizes of the core images. Figure 19 shows CAIR results by using SC and RD-SC. For comparison purpose, scaled images by imresize function of MATLAB are also shown. All images are resized to have the same size of the retargeted images by RD-SC. From the figures, RD-SC presents very similar retargeted images to those obtained by SC.
C Computational complexity
It is worth noting that the computational complexity in the seam approximation process of RD-SC is higher than those of SC due to the recursive estimation. It is difficult to present the complexity theoretically due to the highly image-adaptive computations of RD-SC. Thus, the execution time is compared. All algorithms are implemented on a MATLAB (R2011a) platform. The simulation is run on the desktop computer with Intel 3.4-GHz CPU (Corei7-2600), 16-GB RAM. Table 6 shows the execution time of the average of five executions for Beach image. In the table, the seam calculation time by SC (the column of “SC”) is also shown for comparison. The computation time is proportional to the number of seams to be removed. Seam path approximation time is longer than that of the original SC. However, Table 6 also presents the seam path decoding time of RD-SC. Obviously, it is significantly faster than the seam calculation of SC. It is beneficial for the traditional client–server model, where the decoder will be a mobile device which often has poor computation power.
D) Application to image coding
Here, we compare image coding performance. As the conventional encoders, SPIHT, the method in [Reference Nguyen, Yang and Cai9] (denoted as SC-SPIHT) and CD-based SPIHT [Reference Tanaka, Hasegawa and Kato11] (denoted as CD-SPIHT) are used. Our CAIC method is referred to as CA-SPIHT in this section. Note that SC-SPIHT does not include pixel values of seams; that is, the compressed pruned image is interpolated by linear interpolation since the bitrate for SPI already requires a high percentage of the entire bit budget. All methods apply five-level 9/7 DWT for core imagesFootnote 3 before performing SPIHT.
The R-D curves, encoded with Δ = 1 in (18), are shown in Fig. 20. SC-SPIHT is inferior to the other methods due to its large amount of bitrates for SPI. Furthermore, CD-SPIHT is worse than SPIHT and CA-SPIHT especially for high bitrates since its non-ROI is just interpolated from the core image at the decoder. SPIHT and CA-SPIHT are comparable, but SPIHT slightly outperforms the CA-SPIHT. More precisely, in comparison to SPIHT, CA-SPIHT increases bitrates less than or equal to 10 %. However, the bitrate increase would be a permissible penalty for users demanding both of a good CAIR quality and image coding performance at full resolution. In addition, reconstructed images at full resolution are shown in Figs. 21–24. Similar to the R-D curves, the reconstructed images by SC-SPIHT and CD-SPIHT have interpolation artifacts, whereas our CA-SPIHT significantly reduces those artifacts and the reconstructed images are very similar to those encoded by SPIHT.
Moreover, Fig. 25 represents PSNRs of core images for a few step sizes of the pre-quantization in Section IV-B. As expected, if Δ in (18) is large, the quality of the core image is improved especially for high bitrate coding. In addition, PSNR gains from Δ = 1 depend on the test images. For example, seam paths for Park Joy pass through the very smooth regions, the PSNR gains due to the pre-quantization is marginal, whereas for Beach, the PSNR gain is significant since vertical seams pass through the highly textured sand area.
VI. Conclusions
In this paper, we proposed rate-dependent structures of SC by piecewise approximation of the original seam paths. Furthermore, a new parent–children relationship of SPIHT is proposed for CAIC. In the experimental results, our approach shows comparable CAIR performances to those of SC with significant reductions of required bitrates for side information. Its image coding performances are comparable to those of the original SPIHT and also it consistently outperforms the conventional CAICs. Our RD-SC could be extended to video retargeting, which is one of our ongoing studies.
Acknowledgements
The authors thank krossbow (Mt. Evans), wsuph001 (Pisa Tower), and wenno (Beach) of flickr (www.flickr.com) for making available images in creative commons license. This work was supported in part by JSPS KAKENHI Grant Number 24760288, SCAT Research Grant, The Nakajima Foundation, and Kayamori Foundation of Informational Science Advancement.