How librav1e Psycho-Visual Tuning Affects Image Quality
This article explains how psycho-visual tuning in the
librav1e AV1 encoder optimizes perceived video and image
quality. It covers the difference between mathematical and human-centric
compression, the key mechanisms used by the encoder—such as adaptive
quantization and variance-based masking—and how these techniques balance
file size with visual fidelity.
Understanding Psycho-Visual Tuning
Traditional video compression encoders rely heavily on mathematical metrics like Peak Signal-to-Noise Ratio (PSNR) and Structural Similarity Index (SSIM) to measure quality loss. While these metrics are easy to calculate, they do not always align with how humans actually see. A mathematical metric treats every pixel and artifact with equal importance.
Psycho-visual tuning in librav1e (the library form of
the rav1e AV1 encoder) alters the encoding process to
prioritize human visual perception. By understanding the limitations and
biases of the human visual system, the encoder discards data that the
eye cannot easily detect and preserves details that the eye is highly
sensitive to.
Key Mechanisms of Psycho-Visual Tuning in librav1e
To improve perceived image quality without ballooning the file size,
librav1e employs several targeted psycho-visual
algorithms:
1. Variance Adaptive Quantization (VAQ)
The human eye is highly sensitive to compression artifacts in flat, smooth areas (like a clear blue sky or a dark gradient wall) where blockiness and banding are immediately obvious. Conversely, the eye struggles to notice compression noise in highly textured, complex areas (like grass, gravel, or foliage).
librav1e uses Variance Adaptive Quantization to analyze
the spatial complexity of a frame. It allocates more bitrate (lower
quantization) to flat areas to prevent banding, and fewer bits (higher
quantization) to complex, textured areas.
2. Temporal and Spatial Masking
Human vision suffers from “masking” effects. When there is a sudden scene change or high-speed motion, the brain takes a fraction of a second to adjust, making it temporarily blind to fine details. Spatial masking occurs when strong visual patterns hide nearby errors.
librav1e utilizes these phenomena to aggressively
compress fast-moving objects or transient frames, saving bits that are
redirected to static, high-detail scenes where the viewer is more likely
to linger and notice quality degradation.
3. Chroma Tuning and Color Sensitivity
The human eye has significantly lower spatial resolution for color
(chrominance) than for brightness (luminance). Psycho-visual tuning in
librav1e optimizes the quantization of chroma channels. By
allowing more compression on the color data while keeping the luma
(brightness) sharp, the encoder achieves lower bitrates with virtually
no drop in perceived color richness.
The Impact on Perceived Quality vs. Objective Metrics
When psycho-visual tuning is enabled in librav1e, the
objective mathematical scores (like PSNR) often decrease. This happens
because the encoder is intentionally introducing mathematical “errors”
in areas where humans cannot see them.
However, the subjective, perceived image quality increases dramatically: * Reduced Banding: Smooth gradients remain clean and fluid. * Preserved Textures: Fine details like hair, skin pores, and fabric maintain their natural look instead of being smudged into a blurry mush. * Sharper Edges: High-contrast boundaries remain crisp, reducing the “ringing” artifacts common in heavy compression.
Ultimately, librav1e’s psycho-visual tuning allows
content creators to deliver visually stunning AV1 video streams at
significantly lower bitrates than would be required by purely
mathematically optimized encoding.