Hostname: page-component-cd9895bd7-gvvz8 Total loading time: 0 Render date: 2024-12-26T17:12:29.162Z Has data issue: false hasContentIssue false

The technology behind the exceptional visual experience via high dynamic range

Published online by Cambridge University Press:  05 December 2018

Neeraj J. Gadgil*
Affiliation:
Dolby Laboratories Inc., 432 Lakeside Dr, Sunnyvale, CA 94085, USA
Qing Song
Affiliation:
Dolby Laboratories Inc., 432 Lakeside Dr, Sunnyvale, CA 94085, USA
Guan-Ming Su
Affiliation:
Dolby Laboratories Inc., 432 Lakeside Dr, Sunnyvale, CA 94085, USA
Samir N. Hulyalkar
Affiliation:
Dolby Laboratories Inc., 432 Lakeside Dr, Sunnyvale, CA 94085, USA
*
Corresponding author: Neeraj J. Gadgil Email: [email protected]

Abstract

High dynamic range (HDR) technology is rapidly changing today's video landscape by offering spectacular visual experiences. The development in display technology to support higher luminance levels for commercial and consumer electronic devices such as TVs, smartphones, projectors etc., has created an exponential demand for delivering HDR content to viewers. The essential component of the HDR technology is “expanded contrast,” which allows richer black levels and enhanced brightness, providing dramatic contrast that reveals finer details. The use of “wide color gamut” allows wider color spectrum and richer colors providing aesthetically pleasing true-to-life feel. Such visual enhancements clearly establish HDR as one of the most significant upcoming video technologies.

In this paper, we review major technical advances in this exciting field of study. Quantization of HDR signals is reviewed in the context of transfer functions that convert optical signals to electrical signals and vice versa. They mainly consist of Perceptual Quantization and Hybrid-Log-Gamma approaches. Compression of HDR content is another broad area of study involving several coding approaches, often categorized in terms of backward-compatibility and single/dual layer methods. Some key industry applications of HDR processing systems are also discussed, followed by some future directions of HDR technology.

Type
Industrial Technology Advances
Creative Commons
Creative Common License - CCCreative Common License - BY
This is an Open Access article, distributed under the terms of the Creative Commons Attribution licence (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted re-use, distribution, and reproduction in any medium, provided the original work is properly cited.
Copyright
Copyright © The Authors, 2018

I. INTRODUCTION

Video displays have come a long way since the use of cathode ray tube (CRT) [Reference Poynton1]. Over past few decades, we have seen several technologies such as liquidcrystal display (LCD), solid-state devices such as light emitting diode (LED), organic LED (OLED), Plasma in commercial products. Since the debut of high definition television (HDTV) standard, there have been attempts to create many new standards aiming at display quality and new digital formats that can offer a better visual experience. Some such examples include 4K ultra high definition (UHD) picture resolution, high framerate (HFR), and high dynamic range (HDR). Among these, HDR is rapidly changing the video display landscape and widely regarded as the most significant technology for the video display market since the HDTV itself. The essential components of HDR technology are “expanded contrast” and “wide color gamut” [24]. A higher contrast ratio leads to richer black levels and enhanced brightness while revealing finer details across entire luminance range. The use of wider color spectrum allows for more colors and smoother gradation within the color space offering aesthetically pleasing and closer-to-reality experiences enabled by the human visual system's color sensing capability. In many situations, visual enhancements made with the aforementioned ingredients significantly exceed those made only with a higher resolution and/or a higher frame rate.

The human visual system (HVS) can perceive a wide range of luminance,Footnote 1 ranging from faint starlight of about 10−6 cd/m2 (or “nits”), all the way to direct sunlight of about 108 cd/m2. Dynamic range (DR) in the context of luminance is the ratio between the highest and the lowest luminance, also measured in “stops,” that is logarithm (base 2) of the ratio. It is commercially known as “contrast ratio.” The human eye can simultaneously perceive about 14 stops or about 16000:1 range, known as High DR or HDR [Reference Barten5]. The CRT technology allowed maximum luminance of 100 cd/m2 providing a limited DR, known as standard DR (SDR). An SDR display typically offers 6 stops for 8-bit content and 10 stops for 10-bit content [Reference Konstantinides, Su, Gadgil, Bhattacharyya, Deprettere, Leupers and Takala6]. Today's displays are capable of producing maximum luminance significantly higher than 100 cd/m2, leading to a range higher than SDR, practically HDR. As of the year 2018, there are many commercially available HDR-branded TVs that support peak luminance of 600–1000 nits or even some above 1000 nits. There exist a number of displays with higher peak luminance, designed for specific purposes by various industries and research laboratories. Trends suggest that in the next few years, there are likely to be many HDR-capable consumer TVs with peak luminance of 1500 nits or brighter [2,4]. With this development, there is a considerable interest and demand for capturing and processing video content suitable for higher nits displays.

A common misconception about HDR is that a higher dynamic range just means higher brightness, which is generally not true [Reference Borer and Cotton3]. One can achieve higher dynamic range by offering better contrast for darker movie environment with a peak luminance of 100 nits. Conversely, the scenes with very high brightness, displayed with SDR settings would typically fail to offer the HDR viewing experience, for example, over signage displays. Note that contrast typically enables increased depth and images appear more 3D like according to user surveys of HDR. Secondly, HDR content typically requires more than 10-bits per pixel of luminance channel for lossless representation. Whereas, most of the widely deployed video processing systems that implement existing compression standards such as H.264 [Reference Wiegand, Sullivan, Bjontegaard and Luthra7] or HEVC [Reference Sullivan, Ohm, Han and Wiegand8] support only up to 10-bits content and are intended for SDR content of up to 100 nits luminance. Therefore, delivering video content suited for HDR display demands significant technical research and development.

Considering wide interest and appeal that the HDR technology has gained in the past few years, several standardization entities such as the International Telecommunications Union (ITU), Society of Motion Picture and Television Engineers (SMPTE), Consumer Electronics Association (CEA), Motion Picture Experts Group (MPEG), and the Blu-ray Disc Association (BDA) have developed specifications for HDR content creation, transport, delivery, and display. Moreover, Dolby Laboratories, British Broadcasting Corporation (BBC), Japan Broadcasting Corporation (NHK), Technicolor, Philips, etc., have developed their specifications and/or interpretations for delivering/displaying HDR content.

In this paper, we review key technological aspects of HDR. Various quantization approaches for HDR signals using non-linear transfer functions are discussed. Then, some important single and dual layer HDR video coding methods are described. Recent key industry adaptations are also reviewed, followed by our speculation on the future evolution of HDR technology.

II. HDR SIGNAL QUANTIZATION

Figure 1 shows a typical practical video capture-to-display pipeline. The camera-captured “linear” light is converted to electrical signal using a non-linear function known as Opto-Electrical Transfer Function (OETF). It involves a quantization process which restricts the electrical (digital) signal to certain bit-depth. OETF-based quantization is typically followed by compression for storage and/or transmission. At the display, the electrical signal is converted back to linear light using another non-linear function known as Electro-Optical Transfer Function (EOTF). These transfer functions (OETF and EOTF) are not inverses of each other because the input optical signal for OETF is the scene-light that is captured, while the optical signal for EOTF is the displayed light on, say a TV. Therefore, the video pipeline from capture to display intrinsically involves an Opto-Optical Transfer Function (OOTF). Clearly, one can standardize two of the three transfer functions: OETF, EOTF, and OOTF, leaving the remaining to be derived using the others [Reference Gish and Miller9].

Fig. 1. A typical video capture-to-display pipeline.

Traditionally, the EOTF used for CRT displays was the “gamma” curve, specified in ITU-R BT.1886 [10] and the OETF used at the capture side was generally BT.709 [11]. According to BT.709 [11], the traditional gamma curve is given by:

(1)$$E = \left\{\matrix{4.5{L_{sc}}, \hfill & 0 \le {L_{sc}} \lt 0.018 \hfill \cr 1.099{L_{sc}}^{0.45}-0.099, \hfill & 0.018 \le {L_{sc}} \le 1 \hfill \cr}\right.,$$

where L sc denotes the normalized input (scene) luminance ∈ [0, 1] and E, also ∈ [0, 1], is the non-linear electrical signal representing the input linear luminance. This gamma-based coding was sufficient for handling the SDR content which was typically limited to 8–10 bits and 100 [its.

To accommodate higher dynamic range signals, Miller et al. proposed an EOTF by designing a perceptually optimal curve known as “PQ” (for Perceptual Quantization) [Reference Miller, Nezamabadi and Daly12]. It makes use of just noticeable difference (JND) based on Barten's visual contrast sensitivity function [Reference Barten5] to obtain “perceptually-uniform” signal quantization. Since PQ corresponds closely to the human perceptual model, it makes the most efficient use of bits throughout the entire luminance range. A PQ-encoded signal can represent luminance levels up to 10 000 nits with relatively fewer extra codewords [4]. This EOTF was subsequently standardized by SMPTE in 2014 as ST 2084 [13].

BBC/NHK jointly proposed the use of a specific OETF curve which is detailed in the Association of Radio Industries and Businesses (ARIB) STD-B67 [14]. This curve is called Hybrid-Log-Gamma (HLG), in which the lower half of the signal values uses a gamma curve and the upper half uses a logarithmic curve [Reference Borer and Cotton3]. This hybrid OETF makes it possible to send a single bitstream that is compatible with both SDR and HDR displays.

The recommendation ITU-R BT.2100 standardizes HDR television production and display system specifying both PQ and HLG curves [15]. It states that an HDR system should make consistent use of the transfer functions of one system or the other and not intermix them. Figure 2(a) shows a conceptual diagram of HDR video pipeline [Reference Gish and Miller9,15].

Fig. 2. HDR system diagram as stated in BT.2100 [15].

The PQ approach is defined by its EOTF as shown in Fig. 2(b). Let $\tilde {E}$ be the non-linear electrical signal ∈ [0, 1] at the display and L d be the linear luminance of the output (display) between ∈ [0, 10000] nits.

(2)$${L_{d}} = 10000{\left({\displaystyle{{\max[({\tilde{E}}^{{1}/{m_2}}-c_1),0]}\over{c_2-c_3{\tilde{E}}^{{1}/{m_2}}}}}\right)}^{{1}/{m_1}},$$

where m 1, m 2, c 1, c 2, and c 3 are rational constants. Now the OETF is derived using the EOTF such that it yields the specified OOTF [15].

(3)$$E = {{\rm EOTF}}^{-1}[L_d] = {\left(\displaystyle{{c_1 + c_2 Y^{m_1}}\over{1 + c_3 Y^{m_1}}}\right)}^{m_2},$$

where Y = L d/10000 is the normalized displayed luminance.

In a complementary fashion, the HLG approach, as shown in Fig. 2(c), is defined by its OETF and the EOTF may be derived from it using the specified OOTF [15].

(4)$$E = \left\{\matrix{\sqrt{3{L_{sc}}}, \hfill & 0 \le {L_{sc}} \lt \displaystyle{{1}\over{12}} \hfill \cr a \ln({12L_{sc}}-b)+c, \hfill & \displaystyle{{1}\over{12}} \le {L_{sc}} \le 1 \hfill \cr}\right.\,.$$

Thus, BT.2100 incorporates both quantization approaches enabling the development of various video processing systems.

III. COLOR SPACES FOR HDR

The ITU has standardized the YCbCr color space within the recommendation ITU-R BT.601 [16]. YCbCr is composed of a nonconstant luminance channel Y and the blue- and red-difference chroma components Cb and Cr, respectively. Gamma correction can be applied to the luminance channel to reduce the perception of quantization error. The resulting representation is commonly referred to as Y'CbCr. Subsequently, BT.709 has standardized the format of HDTV that also defined the color primaries to be used for content display [11].

Wide color gamut (WCG) enables content and displays to represent a larger volume of colors than that supported by earlier color standards such as BT.601 and BT.709. The P3 color gamut, specified as the DCI for cinema presentation [17] is larger than BT.709. BT.2020 specifies various aspects of ultra-high-definition television (UHDTV) with WCG, including picture resolutions, frame rates with progressive scan, bit depths, color primaries [18]. Note that BT.2020 color space container can represent BT.709, P3, or any gamut up to and including the full BT.2020 spec. A comprehensive review of color technology is presented in [Reference Berns19].

Recently, Dolby Laboratories proposed the ICtCp color space addressing the limitations of Y'CbCr [Reference Perrin, Rerabek, Husak and Ebrahimi20]. ICtCp extends the IPT color space [Reference Ebner and Fairchild21] by exploring a higher dynamic range (up to 10000 cd/m2) and larger color gamuts such as BT.2020 [18]. The main characteristics of ICtCp color space are achromatic I-channel and isoluminance, Hue linearity, perceptually uniform colors and quantization to limited bit-depth. A detailed analysis of ICtCp versus Y'CbCr is presented in [Reference Perrin, Rerabek, Husak and Ebrahimi20].

IV. CODING HDR SIGNALS

A) Dual-layer backward-compatible codecs

Though the amount of HDR displays is increasing, the majority of consumers still have SDR displays which typically use gamma EOTF and BT.709 color space. HDR videos, if represented by other EOTF and color space, cannot be shown properly on SDR displays. That demands the distribution system to deliver videos of correct format to a given display. SDR version of the video has the same content as the HDR version, so there is great redundancy between the two. System providers, which have limited storage, prefer to provide a bitstream that can be viewed on both SDR and HDR displays. This distribution system is called backward-compatible system (codec).

A classic backward-compatible codec is a dual-layer system proposed by Mantiuk et al. [Reference Mantiuk, Efremov, Myszkowski and Seidel22]. Figure 3 shows the encoder flowchart. The structure is similar to scalable coding architecture. The inputs to the dual-layer encoder are a HDR video and the corresponding SDR version. The SDR video can be generated by tone/gamut mapping and quantization, and can be color-graded by a professional colorist following a director's intent. The SDR video is encoded by a legacy encoder (e.g., AVC [Reference Wiegand, Sullivan, Bjontegaard and Luthra7] or HEVC [Reference Sullivan, Ohm, Han and Wiegand8]). The output bitstream will serve as the base layer (BL).

Fig. 3. Dual-layer backward-compatible encoder.

The HDR video will be reconstructed by an enhancement layer (EL). The compressed SDR is first reconstructed by a corresponding legacy decoder. Color space transformation is applied to both the reconstructed SDR and the source HDR signal, such that both videos are converted to a compatible and perceptually-uniform color space. A prediction function is applied to the color transformed reconstructed SDR which yields the predicted HDR signal. This prediction function will be sent to the decoder as metadata, i.e., auxiliary information as a part of the coded bitstream. The difference between the predicted HDR and the original HDR is then filtered, quantized, and compressed by the same legacy encoder. This produces a residual bitstream.

At the decoder (Fig. 4), a legacy decoder produces the base layer bitstream. The reconstructed SDR can be shown directly on SDR displays. For HDR displays, color transformation is applied to the reconstructed SDR, followed by the HDR predictor from the metadata. The decoded residual bitstream is added to the predicted HDR, yielding the reconstructed HDR signal. Color transformation may be needed to convert the reconstructed HDR signal to match the display's color space.

Fig. 4. Dual-layer backward-compatible decoder.

In some designs, the color space transformations in both encoder and decoder are omitted, or absorbed by the HDR prediction function. Some of the proposed HDR prediction functions can be found in [Reference Mai, Mansour, Mantiuk, Nasiopoulos, Ward and Heidrich23Reference Chen, Su and Yin25].

B) Dual-layer non-backward-compatible codecs

A main drawback of the dual-layer backward-compatible codec is the high computational complexity required to obtain an accurate HDR prediction mechanism. Another drawback is the high bit rate demanded by the residual bitstream in order to achieve sufficiently good quality of HDR video. SDR and HDR versions of video usually have different amounts of details in high-intensity and low-intensity regions, thus the energy of the residual is usually high. It can result in undesired clipping after compression [Reference Konstantinides, Su, Gadgil, Bhattacharyya, Deprettere, Leupers and Takala6].

Recently, many applications focus only on the quality of HDR and do not require backward compatibility. Codecs designed for these applications usually require less computation and a lower bit rate. Dolby non-backward-compatible 8-bit dual-layer codec [Reference Su, Qu, Hulyalkar, Chen, Gish and Koepfer26] is a good example. The architecture of this codec is similar to the dual-layer backward-compatible codec, but the base layer signal which is not intended to be watched is generated such that the overall bit requirement of both layers can be minimized. The base layer is rendered from the HDR signal by linear quantization, linear rescaling or non-linear mapping, etc.

There are other dual-layer non-backward-compatible codecs. For example, in [Reference François and Taquet27Reference Auyeung and Xu30], the HDR signal is split as the most-significant-bit (MSB) layer and the least-significant-bit (LSB) layer (Fig. 5). The two layers are compressed independently by legacy encoders. A message to combine the two layers at the decoder is signaled using supplemental enhancement information (SEI).

Fig. 5. Dual-layer MSB/LSB splitting architecture.

C) Single-layer non-backward-compatible codecs

Generally, dual-layer systems demand higher data usage and more computational resource. The two layers need to be encoded, decoded, and synchronized with the metadata. In response, single-layer codecs that include only a base layer and metadata, have been proposed.

One widely used distribution system of HDR videos is HEVC main 10 profile [31] with metadata in video usability information (VUI) and SEI message, usually referred to as HDR-10. The essential metadata of HDR-10 includes:

  • EOTF: SMPTE ST 2084 PQ

  • Color primaries: ITU-R BT.2020

  • Color space: Y'CbCr non-constant luminance ITU-R BT.2020

  • Mastering display color volume: mastering display color volume SEI

  • MaxFALL and MaxCLL: content light level SEI

The metadata of HDR-10 is static, i.e., remaining constant throughout a whole video. At least 10-bit depth has to be used in order to avoid contouring artifacts since there is no residual bitstream to compensate for LSB. But the bit-depth rarely goes beyond 10-bit, as the peak brightness of today's common HDR displays does not exceed 1000 nits.

As PQ supports luminance up to 10 000 nits, the luma values of many HDR videos are limited to a small range. Besides, HEVC is tuned for gamma-coded signals thus yielding unexpected artifacts when used to encode PQ-coded content. For example, for a 10-bit 100-nit gamma-coded content, codewords from 64 to 940 will be occupied. However, if the content is represented by PQ, only codewords from 64 to 509 are used. Therefore, an error of one codeword at PQ codeword 509 would be roughly equivalent to an error of 4 codewords at gamma codeword 940, although both PQ codeword 509 and gamma codeword 940 represent 100 nits [Reference Ström32].

To treat with this issue, Lu et al. [Reference Lu, Yin, Chen and Su33] proposed to select QP (Quantization Parameter) for PQ-coded signals based on the luminance as well as regions of interest (ROI) and spatial properties such as edges and texture. Ström et al. [Reference Ström32] proposed a luma-adaptive QP offset look-up table, such that smaller QP is assigned to brighter codewords.

Some of the other methods try to “reshape” the HDR signal to improve the compression efficiency [Reference Perrin, Rerabek, Husak and Ebrahimi20,Reference Lu34Reference Lu37]. Figure 6 shows the architecture. A reshaping function redistributes the codeword and re-quantizes the HDR signal which would change the bit allocation in compression. The reshaping function can be linear or non-linear. The reshaped signal is compressed by a legacy encoder (not necessarily HDR-10). The inverse reshaping function is transmitted to the decoder as dynamic metadata, to reconstruct the HDR signal.

Fig. 6. Single-layer HDR codec.

The inverse reshaping function may be transmitted as color remapping information (CRI) SEI message which is specified in the HEVC standard (v.4 or version 12/2016) [31]. A color remapping model consists of three procedures: (1) [ piece-wise linear function for each color component, (2) [ 3 × 3 matrix applied to the three mapped color components, and (3) a second piece-wise linear function for each resulting color component. The pivot points of both input and target values of the piece-wise linear functions and the 3 × 3 matrix are transmitted by the CRI SEI message. The input to the color remapping model is the upsampled decoded color samples to 4:4:4 color sampling format. CRI was originally designed to support multiple display scenarios, e.g., assisting to display a SDR signal on a BT.2020 display. The inverse reshaping function, if sent by CRI SEI, has to be specified by the CRI model. The design of the reshaping function would be constrained, so the compression efficiency would be limited. If the inverse reshaping function is sent by metadata defined otherwise, a better performance can be achieved but the metadata format is not supported by HEVC standard.

D) Single-layer backward-compatible codecs

A number of single-layer codecs intend to achieve backward compatibility, i.e., the base layer can be watched directly on a display scenario, and the signal for one or more other display scenarios can be constructed from the base layer and, probably, metadata. For example, the base layer is for SDR displays, and the base layer and the metadata are used for HDR displays. The codec architecture is similar to that of non-backward-compatible codecs (Fig. 6), but the base layer is intended to be watched on the SDR displays. Since there is no residual signal, and the conversion between the base layer and HDR is limited to a well-defined function specified in metadata, it may be hard for both the base layer and the HDR signals to preserve the artistic intent.

1) Dolby's single-layer backward-compatible solution

Dolby's single-layer backward-compatible solution [Reference Kadu, Song and Su38] is an example. The base layer is generated from the source HDR signal by some tone and color mapping. The encoder also generates the metadata that will be used by the decoder to construct the inverse reshaping functions. Piecewise polynomials and multi-variate multiple regression are used to represent the inverse reshaping functions for luma and chroma, respectively. The polynomial coefficients are sent as metadata. The base layer can be SDR, HDR-10, HLG-coded signal, etc., in any color space. HDR signal of various target peak luminance can be rendered at the decoder.

2) SL-HDR

Technicolor R&D France and Philips International jointly proposed SL-HDR solution. SL-HDR1 [Reference François and van de Kerkhof39], SL-HDR2 and SL-HDR3 take SDR, HDR-10 and HLG-coded signal as the base layer, respectively. Take SL-HDR1 as an example. The base layer SDR signal is rendered from the source HDR signal by some tone mapping and color correction. The base layer is highly recommended to be coded as 10-bit in order to avoid contouring artifacts. The metadata includes a dynamic part and a static part. The static part includes the color gamut of the SDR/HDR signal and the target luminance of the mastering display used when grading the HDR signal. The dynamic part produces a luma-related look-up table (LUT) and a color correction LUT. There are two modes for the dynamic part. One is parameter-based mode. The luma-related LUT is constructed by the parameters in the dynamic metadata. The color correction LUT is constructed by multiplying a default pre-defined LUT with a piece-wise linear table whose parameters are conveyed by the dynamic metadata. The other mode is the table-based mode. Both the luma-related LUT and the color corrected LUT are coded explicitly.

3) HLG

Most codecs above take PQ as the EOTF of the HDR signal. If HLG [15] is used as the EOTF, the HDR signal can be compressed and transmitted directly without metadata, to SDR displays and to HLG-compatible HDR displays. The content can be rendered unprocessed and shown properly on both types of displays, in a naturally backward-compatible way. But as the low light part of HLG is a gamma function, fewer codewords are allocated there by HLG than by PQ, thus HLG cannot show many of the dark parts of a video compared to PQ. Besides, the HLG-coded video can suffer from color shifting (changes in hue) when displayed on a SDR device, especially when an object moves to a bright ordark area [Reference Luthra, François and van de Kerkhof40Reference Holm42].

V. INDUSTRY ADAPTATIONS

With the HDR technology experiencing a tremendous growth in terms of number of formats, several industries have developed HDR processing systems using their own applications of standards.

At the forefront of HDR solutions is Dolby Vision, developed by Dolby Laboratories Inc. as an end-to-end processing system that optimizes the creation, distribution and display of HDR content. It uses PQ transfer function [13,15], offering up to 4K resolution, 12-bit color-depth and dynamic metadata. It is widely believed to deliver the most consistent, reliable, and highest-quality HDR solution. Dolby Vision is implemented and/or endorsed by companies such as Warner, Disney, Sony, Sharp, Apple, LG, Netflix, Panasonic, Philips, TCL, Vizio, Amazon, Google, Lenovo, Microsoft, Comcast, etc.

HDR-10 is an open format application based on BT.2100 [15] PQ standard. It is implemented by Dell, Dolby Laboratories, LG, Microsoft, Samsung, Sony etc. In early 2018, Samsung, Panasonic, 20th Century Fox and Warner announced support for the HDR10+ format, which enhances HDR-10 by including dynamic metadata, similar to that described in UHD Blu-Ray standard in ATSC 3.0 [43]. HDR10+ is currently implemented by Amazon Video, Panasonic UHD Blu-Ray players, and Samsung TV.

Hybrid Log-Gamma (HLG), as specified in BT.2100 [15] is jointly developed by BBC and NHK, and is mainly targeted ata live production. Standardizing the OETF also makes it possible to broadcast a single stream that is compatible with both SDR and HDR televisions. It is implemented by video services DirecTV and YouTube. SL-HDR1, developed by Technicolor and Philips, is a single-stream technology, mainly focusing at over-the-top (OTT) video streaming services.

Many industrial applications are compatible with multiple formats. For example, in most cases, a Dolby Vision-certified device is also capable of displaying HDR-10 [2]. Panasonic has announced that HDR10+ and Dolby Vision will be adopted into their Blu-Ray player simultaneously. Apple's iPhone X, launched in 2017, supports HDR formats and its Final Cut Pro 10.4 supports Dolby Vision, HDR-10 and HLG in WCG i.e. BT.2020 [44]. Intel first launched support for HDR in the form of Ultra HD Blu-ray playback in 2016 on Kaby Lake platforms on Windows 10. At present, 7th and 8th Gen Intel Core processor platforms support HDR-10, and expansion of support to other standards is being investigated [45].

VI. THE FUTURE OF HDR TECHNOLOGY

Due to growing consumer demand and the availability of commercial content delivery solutions, HDR technology is an active research area. Several approaches are proposed to improve spatial resolution such as UHD or 8K and HFR (60, 120 fps) of the HDR video content, while many others investigate improving the quality of HDR images by reducing artifacts. In [Reference Chen, Su and Peng46], an adaptive upsampling filter that can spatially upscale HDR images based on the luminance range is proposed. A common problem of using linear filters in upsampling using interpolation is overshooting, which is visually more noticeable in HDR content. It is mitigated in another image-upscaling approach that is proposed in [Reference Talebi, Su and Yin47]. This approach uses the gradient map of an image, locally adapting the upscaling filter based on the image content. Local statistics transfer has been proposed a powerful technique for interpreting and transferring parameters across different EOTFs and has been used for various HDR image applications [Reference Wen and Su48].

Banding is another visual artifact, generally noticeable is poorly-compressed HDR content. In [Reference Song, Su and Cosman49], a hardware-friendly implementation of the selective sparse filter is presented. The filter combines a smooth region detection and banding reduction to remove banding artifacts while additionally reducing other compression-related artifacts such as blocking. As used for the SDR content, dithering is another technique for reducing banding in also in HDR images. In [Reference Su, Chen, Lee and Daly50], a pre-dithering approach is presented that involves applying a spatial filter to uniformly distributed noise to generate low-pass or high-pass filtered noise. This noise is added to an image which is then compressed using conventional encoding methods. A more advanced and stronger artifact removal method is based on adaptive dithering using Curved Markov-Gaussian noise model which operates on the quantized domain of an HDR image [Reference Mukherjee, Su and Cheng51].

Many approaches focus on representing the HDR signal in the most efficient or uniform way across multiple display devices/standards. In [52,53], the video and graphics signals are converted toa common format allowing metadata to be embedded in the signal. This enables moving the Display Management (DM) operations from the media player to the HDR display, thus allowing a greater compatibility across various devices. Whereas, in another design, a dual-layer scheme allows the enhancement layer to be used either as SDR residuals or the HDR signal obtained using the original SDR signal [Reference Su, Chung, Wu, Zhai, Zhang and Xiaosong54].

There is also a considerable emphasis on the HDR capture technology. A capture-side method is proposed in [Reference Gupta, Mitsunaga, Iso and Nayar55] that uses multiple exposure times to adaptively obtain an HDR image. Another approach uses an array camera arrangement that consists of at least two subsets of active cameras and adaptive selection of image capture settings [Reference Ciurea and Venkataraman56]. Machine learning is used as a way to reconstruct HDR images based on pre-training. A learning-based approach is proposed to produce high-quality HDR images using three differently exposed SDR images of a dynamic scene [Reference Kalantari and Ramamoorthi57]. Recently, deep convolutional neural networks (CNNs) are used for HDR image reconstruction from a single exposure [Reference Eilertsen, Kronander, Denes, Mantiuk and Unger58].

The future of HDR is likely to witness a rapid development in terms of research and products/solutions with a higher visual quality, better interoperability and offering a wider spectrum of quality enhancement features. Among several promising approaches, artificial intelligence (AI) and CNN/deep-learning-based methods are also likely to be a part of the future HDR technology.

VII. CONCLUSION

In this paper, key advancements in HDR video technology are reviewed. The ability to view video content with higher luminance range and richer colors has significantly changed the landscape of the video processing industry. Various HDR quantization approaches and coding methods are discussed in conjunction with their adaptation to International standards. With a number of industrialapplications that are developed and many being developed, HDR technology is highly likely to remain an active area of research in the coming years.

Neeraj J. Gadgil received the B.E. (Hons.) degree in Electrical and Electronics Engineering from Birla Institute of Technology and Science (BITS) Pilani, India, in 2009 and the P.hD. degree in Electrical and Computer Engineering from Purdue University, West Lafayette, IN, in 2016. He joined Dolby Laboratories Inc., Sunnyvale, CA in 2016. He has co-authored 20+ articles including bookchapters, peer-reviewed journal and conference papers. His research interests are image processing, video compression, machine learning.

Qing Song received the B.Eng. degree in Automation from Tongji University, Shanghai, China, in 2011, and the M.S. and Ph.D. degrees in Electrical Engineering from University of California, San Diego, CA, in 2013 and 2017, respectively. She joined Dolby Laboratories, Inc., Sunnyvale, CA, in 2017. Her research interests include image/video processing, compression and transmission.

Guan-Ming Su is with Dolby Labs, Sunnyvale, CA, USA. He is the inventor of 80+ US patents and pending applications. He is the co-author of 3D Visual Communications (John Wiley & Sons, 2013). He served as an associate editor of Journal of Communications; associate editor in APSIPA Transactions on Signal and Information Processing, and Director of review board and R-Letter in IEEE Multimedia Communications Technical Committee. He also serves as the Technical Program Track Co-Chair in ICCCN 2011, Theme Chair in ICME 2013, TPC Co-Chair in ICNC 2013, TPC Chair in ICNC 2014, Demo Chair in SMC 2014, General Chair in ICNC 2015, Area Co-Chair for Multimedia Applications in ISM 2015, Demo Co-Chair in ISM 2016, Industrial Program Co-chair in IEEE BigMM 2017, Industrial Expo Chair in ACMMM 2017, and TPC Co-Chair in IEEE MIPR 2019. He serves as chair of APSIPA Industrial Publication Committee 2014-2017 and VP of APSIPA Industrial Relations and Development starting 2018. He is a Senior member of IEEE. He obtained his Ph.D. degree from University of Maryland, College Park.

Samir N. Hulyalkar received his B.Tech. from Indian Institute of Technology, his MS in Mathematics, and his MS and Ph.D. in Computer and Systems Engineering from Rensselaer Polytechnic Institute (RPI). He joined Philips in 1991 and developed digital communications algorithms for DTV, cable, satellite, and broadband wireless. He was the chief architect of Philips' first VSB ASIC. He joined NxtWave in February 1998, which was acquired by ATI in June 2002, which in turn was acquired by AMD in Oct 2006; and AMD in turn sold the DTV business unit to Broadcom in 2009. At AMD/ATI/NxtWave, he worked on Front-End products, such as Theater 310/3, and NXT series 2000/2/3/4/5 product development starting from 1998. Later, he worked on Theater 550 (FE+MPEG encoder product). He then became the CTO of the DTV business unit and led a team on IP development for video IP such as Frame-rate-conversion, Deinterlacing, etc and demodulation IP. He joined Dolby Labs in 2010 and has been leading the team on Imaging technology, specifically on High Dynamic Range Video and 3DTV. He holds 92 patents and has co-authored more than 40 papers. He received the Charles M. Close award at RPI in 1992.

Footnotes

1 Luminance is a photometric measure of the luminous intensity per unit area of light incident on a capture device such as the human eye or a camera. The SI unit for luminance is candela per square mt. (cd/m2)

References

REFERENCES

1Poynton, C.: Digital Video and HDTV: Algorithms and Interfaces. Elsevier, Waltham, MA, 2012.Google Scholar
3Borer, T.; Cotton, A.: A “Display Independent” high dynamic range television system. Research & Development White Paper-WHP 309, British Broadcasting Corporation, September 2015.Google Scholar
4HDR demystified: Emerging UHDTV systems, Technical Paper, SpectraCal Inc., vol. 1, March 2016.Google Scholar
5Barten, P.G.J.: Contrast sensitivity of the human eye and its effects on image quality. SPIE Optical Engineering Press, Bellingham, WA, 1999.Google Scholar
6Konstantinides, K.; Su, G.M.; Gadgil, N.: High dynamic range video coding, in Bhattacharyya, S., Deprettere, E., Leupers, R., Takala, J.: Eds, Handbook of Signal Processing Systems, Springer, Cham, 2019.Google Scholar
7Wiegand, T.; Sullivan, G.J.; Bjontegaard, G.; Luthra, A.: Overview of the H.264/AVC video coding standard. IEEE Trans. Circuits Syst. Video Technol., 13 (7) (2003), 560576.Google Scholar
8Sullivan, G.J.; Ohm, J.R.; Han, W.J.; Wiegand, T.: Overview of the high efficiency video coding (HEVC) standard. IEEE Trans. Circuits Syst. Video Technol., 22 (12) (2012), 16491668.Google Scholar
9Gish, W.; Miller, S.: Unambiguous video pipeline description motivated by HDR, in 2016 IEEE Int. Conf. on Image Processing (ICIP), Phoenix, AZ, September 2016, 909–912.Google Scholar
10ITU-R BT.1886, Reference electro-optical transfer function for flat panel displays used in HDTV studio production, Int. Telecommunication Union, Radiocommunication Sector, 2011.Google Scholar
11ITU-R BT.709, Parameter values for the HDTV standards for production and international program exchange, Int. Telecommunication Union, Radiocommunication Sector, 2002.Google Scholar
12Miller, S.; Nezamabadi, M.; Daly, S.: Perceptual signal coding for more efficient usage of bit codes. SMPTE Mot. Imag. J., 122 (4) (2013), 5259.Google Scholar
13SMPTE ST-2084, High dynamic range electro-optical transfer function of mastering reference display. The Society of motion picture and television engineers, 2014.Google Scholar
14ARIB STD-B67, Essential parameter values for the extended imaged image dynamic range television (EIDRTV) system for programme production. Association of Radio Industries and Businesses (ARIB), 2015.Google Scholar
15ITU-R BT.2100-2, Image parameter values for high dynamic range television for use in production and international programme exchage. Int. Telecommunication Union, 2018.Google Scholar
16ITU-R BT.601, Studio encoding parameters of digital television for standard 4: 3 and wide-screen 16: 9 aspect ratios. Int. Telecommunication Union, Radiocommunication Sector, 1990.Google Scholar
17SMPTE-EG-0432-1:2010, Digital source processing - color processing for D-Cinema. The Society of motion picture and television engineers.Google Scholar
18ITU-R BT.2020, Parameter values for ultra-high definition television systems for production and international programme exchange. Int. Telecommunication Union (ITU), 2015.Google Scholar
19Berns, R.S. et al. : Billmeyer and Saltzman's Principles of Color Technology. Wiley, New York, 2000.Google Scholar
20Perrin, A.; Rerabek, M.; Husak, W.; Ebrahimi, T.: ICtCp versus Y'CbCr: Evaluation of ICtCp color space and an adaptive reshaper for HDR and WCG. IEEE Consum. Electron Mag., 7 (3) (2018), 3847.Google Scholar
21Ebner, F.; Fairchild, M.D.: Development and testing of a color space (IPT) with improved hue uniformity, in Color and Imaging Conference, vol. 1998 (1). Society for Imaging Science and Technology, 1998, 8–13.Google Scholar
22Mantiuk, R.; Efremov, A.; Myszkowski, K.; Seidel, H.-P.: Backward compatible high dynamic range MPEG video compression. ACM Trans. Graph., 25 (2006), 713723.Google Scholar
23Mai, Z.; Mansour, H.; Mantiuk, R.; Nasiopoulos, P.; Ward, R.; Heidrich, W.: Optimizing a tone curve for backward-compatible high dynamic range image and video compression. IEEE Trans. Image Process., 20 (6) (2011), 15581571.Google Scholar
24Su, G.-M.; Atkins, R.; Chen, Q.: Backward-compatible coding for ultra high definition video signals with enhanced dynamic range. US Patent 9,549,207, January 2017.Google Scholar
25Chen, Q.; Su, G.-M.; Yin, P.: Near constant-time optimal piecewise LDR to HDR inverse tone mapping. Proc. SPIE, 9404 (2015), 9404.1–11.Google Scholar
26Su, G.-M.; Qu, S.; Hulyalkar, S.N.; Chen, T.; Gish, W.C.; Koepfer, H.: Layer decomposition in hierarchical VDR coding. US Patent 9,497,456 B2, November 2012.Google Scholar
27François, E.; Taquet, J.: AHG18: On 16-bits support for range extensions, in JCTVC-N0142, 14th JCT-VC Meeting, Vienna, Austria, July–August 2013.Google Scholar
28Kim, W.-S.; Pu, W.; Chen, J.; Wang, Y.-K.; Sole, J.; Karczewicz, M.: AhG 5 and 18: High bit-depth coding using auxiliary picture, in JCTVC-O0090, 15th JCT-VC Meeting, October–November 2013.Google Scholar
29Aminlou, A.; Ugur, K.: On 16-bit coding, in JCTVC-P0162, 16th JCT-VC Meeting, January 2014.Google Scholar
30Auyeung, C.; Xu, J.: AhG 5 and 18: Coding of high bit-depth source with lower bit-depth encoders and a continuity mapping, in JCTVC-P0173, 16th JCT-VC Meeting, January 2014.Google Scholar
31ITU Rec. H.265, High efficiency video coding. Series H: Audiovisual and multimedia systems, infrastructure of audiovisual services - coding of moving video, December 2016.Google Scholar
32Ström, J., et al. : High quality HDR video compression using HEVC main 10 profile, in 2016 Picture Coding Symp., December 2016, 1–5.Google Scholar
33Lu, T.; Yin, P.; Chen, T.; Su, G.-M.: Rate control adaptation for high dynamic range images. U.S. Patent Application Publication US 2016/0134870, 2016.Google Scholar
34Lu, T., et al. : Compression efficiency improvement over HEVC main 10 profile for HDR and WCG content, in 2016 Data Compression Conf., March 2016, 279–288.Google Scholar
35Wong, C.-W.; Su, G.-M.; Wu, M.: Joint baseband signal quantization and transform coding for high dynamic range video. IEEE Signal Processing Letters, vol. abs/1603.02980, 2016.Google Scholar
36Ström, J.; Samuelsson, J.; Dovstam, K.: Luma adjustment for high dynamic range video, in 2016 Data Compression Conf., March 2016, 319–328.Google Scholar
37Lu, T., et al. : ITP colour space and its compression performance for high dynamic range and wide colour gamut video distribution. ZTE Communications, special issue on Recent Progresses on Multimedia Coding, Analysis and Transmission, no. 1, 2016.Google Scholar
38Kadu, H.; Song, Q.; Su, G.: Single layer progressive coding for high dynamic range videos, in 2018 Picture Coding Symp. (PCS), June 2018, 86–90.Google Scholar
39François, E.; van de Kerkhof, L.: A single-layer HDR video coding framework with SDR compatibility. SMPTE Mot. Imag. J., 126 (3) (2017), 1622.Google Scholar
40Luthra, A.; François, E.; van de Kerkhof, L.: Report of HDR core experiment 7 on investigating the visual quality of HLG generated HDR and SDR video, in JCTVC-W0027, 23rd JCT-VC Meeting, February 2016.Google Scholar
41Pindoria, M.; Naccari, M.; Borer, T.; Cotton, A.: Some considerations on Hue shifts observed in HLG backward compatible video, in JCTVC-W0119, 23rd JCT-VC Meeting, February 2016.Google Scholar
42Holm, J.: HLG issues, in JCTVC-W0132, 23rd JCT-VC Meeting, February 2016.Google Scholar
43Blu-Ray Disc Read-Only format, Audio Visual Application Format Specifications for BD-874 ROM Version 3.1. White Paper, Blu-Ray Disc Association, August 2016.Google Scholar
44Working with wide color gamut and high dynamic range in Final Cut Pro X: New workflows for editing. White Paper, Apple Inc., December 2017.Google Scholar
45High dynamic range (HDR) on Intel Graphics. Revision 1.0, Technical White Paper, Intel Inc., November 2017.Google Scholar
46Chen, Q.; Su, G.-m.; Peng, Y.: Spatial adaptive upsampling filter for HDR image based on multiple luminance range, in Digital Photography X, vol. 9023. International Society for Optics and Photonics, 2014, p. 902311.Google Scholar
47Talebi, H.; Su, G.-M.; Yin, P.: Fast HDR image upscaling using locally adapted linear filters, in Digital Photography XI, vol. 9404. Int. Society for Optics and Photonics, 2015, p. 94040H.Google Scholar
48Wen, B.; Su, G.-M.: TransIm: Transfer image local statistics across EOTFs for HDR image applications, in To appear, IEEE Int. Conf. on Multimedia & Expo, July 2018.Google Scholar
49Song, Q.; Su, G.-M.; Cosman, P.C.: Hardware-efficient debanding and visual enhancement filter for inverse tone mapped high dynamic range images and videos, in Image Processing (ICIP), 2016 IEEE Int. Conf. on IEEE, 2016, 3299–3303.Google Scholar
50Su, G.-M.; Chen, Q.; Lee, B.; Daly, S.: Pre-dithering in high dynamic range video coding. US Patent 15/035,551, October 2016.Google Scholar
51Mukherjee, S.; Su, G.-M.; Cheng, I.: Adaptive dithering using Curved Markov-Gaussian noise in the quantized domain for mapping SDR to HDR image, in To appear, Int. Conf. on Smart Multimedia, August 2018.Google Scholar
52SMPTE ST 2094-1:2016, Dynamic metadata for color volume transform - core components. May 2016.Google Scholar
53SMPTE ST 2094-10:2016, Dynamic metadata for color volume transform - application #1. May 2016.Google Scholar
54Su, Y.; Chung, C.Y.; Wu, H.-J.; Zhai, J.; Zhang, K.; Xiaosong, Z.: Techniques in backwards compatible multi-layer compression of hdr video. US Patent 10/021,411, July 2018.Google Scholar
55Gupta, M.; Mitsunaga, T.; Iso, D.; Nayar, S.K.: Methods, systems, and media for high dynamic range imaging. US Patent 9,648,248, May 2017.Google Scholar
56Ciurea, F.; Venkataraman, K.: Systems and methods for high dynamic range imaging using array cameras. US Patent 9,774,789, September 2017.Google Scholar
57Kalantari, N.K.; Ramamoorthi, R.: Deep high dynamic range imaging of dynamic scenes. ACM Trans. Graph, 36 (4) (2017), 144.Google Scholar
58Eilertsen, G.; Kronander, J.; Denes, G.; Mantiuk, R.K.; Unger, J.: HDR image reconstruction from a single exposure using deep CNNs. ACM Trans. Graph. (TOG), 36 (6) (2017), 178.Google Scholar
Figure 0

Fig. 1. A typical video capture-to-display pipeline.

Figure 1

Fig. 2. HDR system diagram as stated in BT.2100 [15].

Figure 2

Fig. 3. Dual-layer backward-compatible encoder.

Figure 3

Fig. 4. Dual-layer backward-compatible decoder.

Figure 4

Fig. 5. Dual-layer MSB/LSB splitting architecture.

Figure 5

Fig. 6. Single-layer HDR codec.