rav1e Chroma Subsampling: 4:2:2 and 4:4:4 Support

This article explores how the librav1e AV1 encoder handles high-fidelity chroma subsampling formats, specifically 4:2:2 and 4:4:4. We will examine how the encoder configures these formats, their impact on video quality and encoding performance, and how developers can utilize them for professional video workflows.

Chroma Subsampling in AV1 and rav1e

Chroma subsampling is a method of compressing color information in video files by prioritizing luminance (brightness) over chrominance (color). While consumer video typically uses 4:2:0 subsampling, professional editing, screen recording, and high-end streaming often require 4:2:2 or 4:4:4 formats to preserve fine color details and sharp text.

As an encoder for the AV1 video format, librav1e fully supports the AV1 specification’s profiles, which define how these subsampling formats are handled. Specifically:

Main Profile (Profile 0): Supports 8-bit and 10-bit YUV 4:2:0.
High Profile (Profile 1): Supports 8-bit and 10-bit YUV 4:4:4.
Professional Profile (Profile 2): Supports 8-bit, 10-bit, and 12-bit YUV 4:2:2, as well as 12-bit 4:2:0 and 4:4:4.

How librav1e Processes 4:2:2 and 4:4:4

The librav1e library manages chroma subsampling formats natively through its configuration API. It handles these formats using a structured pipeline:

1. Configuration via the API

Developers configure the desired subsampling format using the ChromaSampling enum in the rav1e API. This enum specifies how the chroma channels are sampled relative to the luma channel: * Cs420 (Half horizontal and vertical resolution) * Cs422 (Half horizontal, full vertical resolution) * Cs444 (Full horizontal and vertical resolution) * Cs400 (Monochrome, no chroma)

The library also allows setting the ChromaSamplePosition to define exactly where the chroma samples are located relative to the luma grid (e.g., Colocated or Vertical/Unknown).

2. Internal Pixel Representation

Inside librav1e, video frames are stored in planes. For a 4:4:4 input, the U and V chroma planes are allocated at the same resolution as the Y (luma) plane. For 4:2:2, the chroma planes have full vertical resolution but half horizontal resolution. The encoder processes these planes without downsampling them to 4:2:0, ensuring that the high-fidelity color information is preserved throughout the motion estimation, transform, and quantization steps.

3. Encoding Optimization and Bit Depth

librav1e supports 8-bit, 10-bit, and 12-bit depths. When encoding 4:2:2 or 4:4:4 video at higher bit depths (such as 10-bit or 12-bit), librav1e utilizes optimized SIMD (Single Instruction, Multiple Data) assembly paths. This ensures that the increased data load of processing full-resolution color channels does not severely bottleneck the encoding process.

Performance and Quality Impact

Using 4:2:2 or 4:4:4 in librav1e has distinct trade-offs:

Color Fidelity: 4:4:4 completely eliminates color bleeding and “fuzzy” red text, which is highly beneficial for screen-casting, remote desktop applications, and gaming content.
Compression Efficiency: Because there is more raw data to compress, 4:4:4 and 4:2:2 files require a higher bitrate to achieve the same perceived quality level as a 4:2:0 file, though the output remains highly efficient due to AV1’s advanced coding tools.
Encoding Speed: Processing 4:2:2 and 4:4:4 formats increases the computational workload. The encoder must perform motion search, intra-prediction, and transform loops on larger chroma blocks compared to 4:2:0.