HDR Insights Series Article 4 : Dolby Vision

HDR Insights Series Article 4 : Dolby Vision

 

In the previous article, we discussed the HDR tone mapping and how it is used to produce an optimum viewer experience on a range of display devices. This article discusses the basics of Dolby Vision meta-data and the parameters that the user needs to validate before the content is delivered.

What is HDR metadata?

HDR Metadata is an aid for a display device to show the content in an optimal manner. It contains the HDR content and mastering device properties that are used by the display device to map the content according to its own color gamut and peak brightness. There are two types of metadata – Static and Dynamic.

Static metadata

Static metadata contains metadata information that is applicable to the entire content. It is standardized by SMPTE ST 2086. Key items of static metadata are as following:

  1. Mastering display properties: Properties defining the device on which content was mastered.
    • RGB color primaries
    • White point
    • Brightness Range
  2. Maximum content light level (MaxCLL): Light level of the brightest pixel in the entire video stream.
  3. Maximum Frame-Average Light Level (MaxFALL): Average Light level of the brightest frame in the entire video stream.

In a typical content, the brightness and color range varies from shot to shot. The challenge with static metadata is that if the tone mapping is performed based on the static metadata, it will be based only on the brightest frame in the entire content. As a result, the majority of the content will have greater compression of dynamic range and color gamut than needed. This will lead to poor viewing experience on less capable HDR display devices.

Dynamic metadata

Dynamic metadata allows the tone mapping to be performed on a per scene basis. This leads to a significantly better user viewing experience when the content is displayed on less capable HDR display devices. Dynamic metadata has been standardized by SMPTE ST 2094, which defines content-dependent metadata. Using Dynamic metadata along with Static metadata overcomes the issues presented by the usage of only the static metadata for tone mapping.

Dolby Vision

Dolby Vision uses dynamic metadata and is in fact the most commonly used HDR technology today. This is adopted by major OTT service providers such as Netflix and Amazon, as well as major studios and a host of prominent television manufacturers. Dolby Vision is standardized in SMPTE ST 2094-10. In addition to supporting for dynamic metadata, Dolby Vision also allows description of multiple trims for specific devices which allows finer display on such devices.

Dolby has documented the details of its algorithm in what they refer to as Content Mapping (CM) documents. The original CM algorithm is version (CMv2.9) which has been used since the introduction of Dolby Vision. Dolby introduced the Dolby Vision Content Mapping version 4 (CMv4) in the fall of 2018. Both versions of the CM are still in use. The Dolby Vision Color Grading Best Practices Guide provides more information.

Dolby Vision metadata is coded at various ‘levels’, the description of which is mentioned below:

Metadata Level/Field   Description
LEVEL 0 GLOBAL METADATA (STATIC)
Mastering Display Describes the characteristics of the mastering display used for the project
Aspect Ratio Ratio of canvas and image (active area)
Frame Rate Frame Rate
Target Display Describes the characteristics of each target display used for L2 trim metadata
Color Encoding Describes the image container deliverable
Algorithm/Trim Version CM algorithm version and Trim version
LEVEL 1 ANALYSIS METADATA (DYNAMIC)
L1 Min, Mid, Max Three floating point values that characterize the dynamic range of the shot or frame

Shot-based L1 metadata is created by analyzing each frame contained in a shot in LMS color space and combined to describe the entire shot as L1Min, L1Mid, L1Max

Stored as LMS (CMv2.9) and L3 Offsets

LEVEL 2 BACKWARDS COMPATIBLE PER-TARGET TRIM METADATA (DYNAMIC)
Reserved1, Reserved2, Reserved3, Lift, Gain, Gamma, Saturation, Chroma and Tone Detail Automatically computed from L1, L3 and L8 (lift, gain, gamma, saturation, chroma, tone detail) metadata for backwards compatibility with CMv2.9
LEVEL 3 OFFSETS TO L1 (DYNAMIC)
L1 Min, Mid, Max Three floating point values that are offsets to L1 Analysis metadata as L3Min, L3Mid, L3Max

L3Mid is a global user defined trim control

L1 is stored as CMv2.9 computed values, CMv4 reconstructs RGB values with L1 + L3

LEVEL 5 PER-SHOT ASPECT RATIO (DYNAMIC)
Canvas, Image Used for defining shots that have different aspect ratios than the global L0 aspect ratio
LEVEL 6 OPTIONAL HDR10 METADATA (STATIC)
MaxFALL, MaxCLL Metadata for HDR10

MaxCLL – Maximum Content Light Level MaxFALL – Maximum Frame Average Light Level

LEVEL 8 PER-TARGET TRIM METADATA (DYNAMIC)
Lift, Gain, Gamma, Saturation, Chroma, Tone Detail, Mid Contrast Bias, Highlight Clipping

6-vector (R,Y,G,C,B,M) saturation and 6-vector (R,Y,G,C,B,M) hue trims

User defined image controls to adjust the CMv4 algorithms per target with secondary color controls
LEVEL 9 PER-SHOT SOURCE CONTENT PRIMARIES (DYNAMIC)
Rxy, Gxy, Bxy, WPxy Stores the mastering display color primaries and white point as per-shot metadata

 

Dolby Vision QC requirements

Netflix, Amazon, and other streaming services are continuously adding more and more HDR titles to their library with the aim of improving the quality of experience for their viewers and differentiating their service offerings. This requires that the content suppliers are equipped to deliver good quality and compliant HDR content. Moreover, having the ability to verify quality before delivery becomes more important.

Many of these OTT services support both the HDR-10 and Dolby Vision flavors of HDR. However, more and more Netflix HDR titles are now based on Dolby Vision. Dolby Vision is a new and complex technology, and therefore checking the content for correctness and compliance is not always easy. Delivering non-compliant HDR content can affect your business and therefore using a QC tool to assist in HDR QC can go a long way in maintaining a good standing with these OTT services.

Here are some of the important aspects to verify for HDR-10 and Dolby Vision:

  1. HDR metadata presence
    • HDR-10: Static metadata must be coded with the correct parameter values.
    • Dolby Vision: Static metadata must be present once and dynamic metadata must be present for every shot in the content.
  2. HDR metadata correctness. There are a number of issues that content providers need to check for correctness in the metadata:
    • Only one mastering display should be referenced in metadata.
    • Correct mastering display properties – RGB primaries, white point and Luminance range.
    • MaxFALL and MaxCLL values.
    • All target displays must have unique IDs.
    • Correct algorithm version. Dolby supports two versions:
      • Metadata Version 2.0.5 XML for CMv2.9
      • Metadata Version 4.0.2 XML for CMv4
    • No frame gaps. All the shots, as well as frames, must be tightly aligned within the timeline and there should not be any gap between frames and/or shots
    • No overlapping shots. The timeline must be accurately cut into individual shots; and analysis to generate L1 metadata should be performed on a per-shot basis. If the timeline is not accurately cut into shots, there will be issues with luminance consistency and may lead to flashing and flickering artifacts during playback.
    • No negative duration for shots. Shot duration, as coded in “Duration” field, must not be negative
    • Single trim for a particular target display. There should be one and only one trim for a target display.
    • Level 1 metadata must be present for all the shots.
    • Valid Canvas and Image aspect ratio. Cross check the canvas and image aspect ratio with the baseband level verification of the actual content.
  3. Validation of video essence properties. Essential properties such as Color matrix, Color primaries, Transfer characteristics, bit depth etc. must be correctly coded.

Netflix requires the Dolby Vision metadata to be embedded in the video stream for the content delivered to them. Reviewing the embedded meta-data in video stream can be tedious and therefore an easy way to extract & review the entire metadata may be needed and advantageous.

How can we help?

Venera’s QC products (Pulsar – for on-premise & Quasar – for cloud) can help in identifying these issues in an automated manner. We have worked extensively with various technology and media groups to create features that can help the users with their validation needs. And we have done so without introducing a lot of complexity for the users.

Depending on the volume of your content, you could consider one of our Perpetual license editions (Pulsar Professional, or Pulsar Standard), or for low volume customers, we also have a very unique option called Pulsar Pay-Per-Use (Pulsar PPU) as an on-premise usage-based QC software where you pay a nominal per minute charge for content that is analyzed. And we, of course, offer a free trial so you can test our software at no cost to you. You can also download a copy of the Pulsar brochure here. And for more details on our pricing you can check here.

If your content workflow is in the cloud, then you can use our Quasar QC service, which is the only Native Cloud QC service in the market. With advanced features like usage-based pricing, dynamic scaling, regional resourcing, content security framework and REST API, the platform is a good fit for content workflows requiring quality assurance. Quasar is currently supported for AWS, Azure and Google cloud platforms and can also work with content stored on Backblaze B2 cloud storage. Read more about Quasar here.

Both Pulsar & Quasar come with a long list of ‘ready to use’ QC templates for Netflix, based on their latest published specifications (as well as some of the other popular platforms, like iTunes, CableLabs, and DPP) which can help you run QC jobs right out of the box. You can also enhance and modify any of these QC templates or build new ones! And we are happy to build new QC templates for your specific needs.

HDR Insights Article 3: Understanding HDR Tone Mapping

HDR Insights Article 3: Understanding HDR Tone Mapping

 

In the previous article – HDR Transfer Functions, we discussed the transfer functions and how digital images are converted to light levels for display. This article discusses how the same HDR image can be displayed differently by different HDR devices.

What is HDR Tone Mapping?

Tone mapping is the process of adapting digital signals to appropriate light levels based on the HDR meta-data. This process is not simply applying the EOTF (Electro-Optical Transfer Function) on the image data but it is rather trying to map the image data with the display device capabilities using meta-data information. Since a broad range of HDR display devices are available in the market, each with their own Nits (i.e. ‘brightness’) range, correct tone mapping is necessary for a good user experience. Since the tone mapping is done based on the meta-data in the video stream, presence of correct meta-data is necessary.

Source footage can be shot at HDR with best of cameras and then mastered on high-end HDR mastering systems, but it still need to be displayed optimally on the range of HDR televisions available in the market. Tone mapping performs an appropriate brightness mapping of the content to device without significant degradation.

Need for HDR Tone Mapping

Let’s say an image is shot with peak brightness of 2000 Nits. If it is displayed on a television with 0-2000 Nits range, the brightness range will be exactly as shot in the raw footage. However, the results will be different on other devices:

High Dynamic Range Tone Mapping

 

Since tone mapping is a necessary operation to display PQ based HDR content on HDR display devices, the television needs to know the native properties of the content in terms of the brightness range used along with mastering system parameters. This information is conveyed in the form of HDR meta-data. After reading the HDR meta-data, display devices can decide the tone mapping parameters so that the transformed video lies optimally within the display range of the display device.

Next article will discuss the specific meta-data for HDR-10 and HDR-10+, two different implementation of the HDR. Stay tuned for that.

Article 2: Transfer functions

Definitions

cd/m2 – The candela (cd) is the base unit of luminous intensity in the International System of Units (SI); that is, luminous power per unit solid angle emitted by a point light source in a particular direction. A common wax candle emits light with a luminous intensity of roughly one candela.

Nits – A non-SI unit used to describe the luminance. 1 Nit = 1 cd/m2.

HDR – High Dynamic range. It is a technology that improves the brightness & contrast range in an image (upto 10,000 cd/m2)

SDR – Standard Dynamic range. It refers to the brightness/contrast range that is usually available in regular, non-HDR televisions usually with range of upto 100 cd/m2. This term came into existence after HDR was introduced

WCG – Wide Color Gamut. Color gamut that offer a wider range of colors than BT.709. DCI-P3 and BT.2020 are examples of WCG offering more realistic representation of images on display devices.

EOTF – electo-optical transfer function. A mathematical transfer function that describes how digital values will be converted to light on a display device.

OETF – optical-electro transfer function. A mathematical transfer function that describes how the light values will be converted to digital values typically within cameras.

OOTF – opto-optical transfer function. This transfer function compensates for the difference in tonal perception between the environment of the camera and that of the display.

PQ – PQ (or Perceptual Quantizer) is a transfer function devised to represent the wide brightness range (upto 10,000 Nits) in HDR devices.

HLG – HLG (or Hybrid Log Gamma) is a transfer function devised to represent the wide brightness range in HDR devices. HLG is quite compatible with existing SDR devices in the SDR range.

HDR Insights Article 2 : PQ and HLG transfer functions for HDR

HDR Insights Article 2 : PQ and HLG transfer functions for HDR

 

In the previous article HDR introduction, we discussed the benefits HDR (High Dynamic Range) brings about in terms of quality of the video. This article talks about how that is achieved.

To display the digital images on the screen, display devices need to convert the pixel values to corresponding light values. This process is usually non-linear and is called EOTF (Electro-Optical Transfer Function). Different types of “Transfer Functions” are supported in different display devices.

Regular HDTV display devices (SDR – Standard Dynamic Range – monitors) normally use BT.709 Gamma transfer function to convert the video signal into light. These monitors are primarily designed to display images with brightness range of up to 100 Nits (cd/m2).

 

High Dynamic Range – Transfer Functions (PQ & HLG)

 

HDR defines two additional transfer functions to handle this issue – Perceptual Quantizer (PQ) and Hybrid Log-Gamma (HLG). HDR PQ is an absolute, display-referred signal while HDR HLG is a relative, scene-referred signal. This means that HLG enabled display devices automatically adapts the light levels based on the content and their own display capabilities while PQ enabled display devices need to implement tone mapping to adapt the light levels. Display devices use content metadata to display PQ coded images. This can come once for the entire video stream (static) or for each individual shot (dynamic)

It is expected that under ideal conditions, dynamic PQ based transformation will achieve the best quality results at the cost of compatibility with existing display systems. Please see examples below:

HDR PQ Transformation

HDR – Signal to light mapping

The graph below describes the mapping of light levels for various transfer functions. Vertical axis shows the signal values on a scale of 0-1 with 0 being black and 1 being white. This is done to make the signal range, bit depth agnostic. Horizontal axis shows the light level in Nits of display device.

HDR - Signal to Light Mapping

Human beings are more sensitive to changes in darker region compared to changes in brighter regions. This property is also exploited in HDR systems providing more granularity in darker regions compared to brighter regions. The graph above depicts that light level range in darker region is represented by a larger signal value range compared to the brighter regions – meaning more granular representation in darker regions. While this is more evenly distributed for the BT.709 based displays, it become less granular for HDR displays in the brighter regions. In case of HLG, more than half of signal values are represented for light level between 0-60 Nits and the remaining signal values are represented in 60-1000 Nits range. Similarly, in case of PQ ST2084 based displays, approx. half of the signal values are represented for light level between 0-40 Nits and the remaining half of signal values are represented in 60-1000 Nits range.

According to the graph, HDR HLG is similar to BT.709 in lower brightness regions therefore offering a better compatibility with the existing SDR display devices. However, HDR PQ is quite different from BT.709. If we try to display the PQ HDR image on a SDR display, darker regions represented by PQ will invariably become brighter thereby reducing the contrast levels of the image, the result being a washed out image (see below)

HDR PQ Images

HLG based image looks much better on a SDR monitor:

HDR HLG Image

While PQ based transforms offers promise to display best quality results on HDR enabled monitors, in comparison to HLG, it requires proper tone mapping by display devices.

This topic will be discussed in our next blog article – Tone mapping.

Definitions

cd/m2 – The candela (cd) is the base unit of luminous intensity in the International System of Units (SI); that is, luminous power per unit solid angle emitted by a point light source in a particular direction. A common wax candle emits light with a luminous intensity of roughly one candela.

Nits – A non-SI unit used to describe the luminance. 1 Nit = 1 cd/m2.

HDR – High Dynamic range. It is a technology that improves the brightness & contrast range in an image (up to 10,000 cd/m2)

SDR – Standard Dynamic range. It refers to the brightness/contrast range that is usually available in regular, non-HDR televisions usually with range of up to 100 cd/m2. This term came into existence after HDR was introduced

WCG – Wide Color Gamut. Color gamut that offer a wider range of colors than BT.709. DCI-P3 and BT.2020 are examples of WCG offering more realistic representation of images on display devices.

EOTF – electro-optical transfer function. A mathematical transfer function that describes how digital values will be converted to light on a display device.

OETF – optical-electro transfer function. A mathematical transfer function that describes how the light values will be converted to digital values typically within cameras.

OOTF – opto-optical transfer function. This transfer function compensates for the difference in tonal perception between the environment of the camera and that of the display.

PQ – PQ (or Perceptual Quantizer) is a transfer function devised to represent the wide brightness range (up to 10,000 Nits) in HDR devices.

HLG – HLG (or Hybrid Log Gamma) is a transfer function devised to represent the wide brightness range in HDR devices. HLG is quite compatible with existing SDR devices in the SDR range.

 

HDR Insights Article 1: High Dynamic Range – Introduction & Benefits

HDR Insights Article 1: High Dynamic Range – Introduction & Benefits

 

An Introduction to High Dynamic Range (HDR)

The industry has been constantly working towards improving the user experience in terms of video content consumption. The efforts have been multi-pronged and one of the key areas so far has been to focus on higher resolutions. While ten years ago, HD was a big thing, 4K has now become a common resolution in many production workflows. 4K is also delivered to consumers in many scenarios though it needs to penetrate better especially on the broadcast side.

Industry is now at a stage where it needs to decide what will bring the next significant improvement in user experience. Some of the key contenders are:

  • Higher resolution. Going to 8K?
  • Higher frame rate. Use more of 50 fps or 100 fps?
  • Wider color range
  • HDR (High Dynamic Range)

Tests have been conducted by many organizations including IRT, EBU etc. and they conclude that HDR probably offers a higher improvement in quality of experience compared to other technological improvements.

Read more at https://tech.ebu.ch/docs/events/webinar061_uhd/presentations/EBU_UHD_Features_and_Tests_Driesnack.pdf

Benefits of High Dynamic Range (HDR)

The holy grail of quality is reproducing a video experience on user’s display device as close as possible to what a human being will perceive when they watch the same scene in nature with their eyes. While this still is an ambitious goal, HDR brings us closer to the reality by offering a wider brightness range (very close to human perception range) and thereby a more realistic experience.

It is known that human visual system (HVS) is more sensitive towards brightness than colors and for the same reason we have color spaces like YUV420, YUV422 that does subsampling for color but retain the brightness information for all the pixels.

Regular SDR (Standard Dynamic Range) monitors available in the market have a range of 1-400 Nits (cd/m2) while HDR allows the representation range of 0-10,000 Nits, which is a significantly wider range of brightness than offered by SDR devices. Currently, available HDR TVs have a range of up to approximately 2,000 Nits. Wider brightness range in HDR simply means that the brightness of each pixel can be more accurately represented rather than being transformed with a higher quantization factor resulting into inaccurate pixel representation (poor quality). The quality improvement with HDR is usually more visible in plain areas with gradients where minor degradation is easily perceived by human eyes.

In essence – HDR means more accurate pixels in terms of their brightness!

Read the next article on transfer functions and how they help in representing a wider brightness range for HDR

Definitions

cd/m2 – The candela (cd) is the base unit of luminous intensity in the International System of Units (SI); that is, luminous power per unit solid angle emitted by a point light source in a particular direction. A common wax candle emits light with a luminous intensity of roughly one candela.

Nits – A non-SI unit used to describe the luminance. 1 Nit = 1 cd/m2.

HDR – High Dynamic range. It is a technology that improves the brightness & contrast range in an image (up to 10,000 cd/m2)

SDR – Standard Dynamic range. It refers to the brightness/contrast range that is usually available in regular, non-HDR televisions usually with range of up to 100 cd/m2. This term came into existence after HDR was introduced