How librav1e Encodes Static Backgrounds and Moving Objects

The librav1e encoder, a prominent Rust-based AV1 video encoder, achieves high compression efficiency by distinguishing between static backgrounds and moving objects within a video sequence. By employing advanced block partitioning, motion estimation, and strategic reference frame selection, librav1e minimizes the data required for unchanging background elements while dynamically allocating bitrate and processing power to preserve details in moving subjects. This article breaks down the specific technical mechanisms librav1e uses to handle these two types of visual data differently.

Block Partitioning and Size Selection

AV1 encoders, including librav1e, divide video frames into blocks called Superblocks, which can be as large as 128x128 pixels. These blocks are recursively split into smaller partitions to capture detail.

Static Backgrounds: For large, flat, or unchanging areas of a frame, librav1e utilizes larger block partition sizes (up to 128x128). Because there is no motion or change in texture, dividing these areas into smaller blocks is computationally wasteful. Using large blocks allows the encoder to describe the entire static area with minimal syntax overhead.
Moving Objects: When an object moves, its boundaries and textures change relative to the background. Librav1e splits these areas into much smaller block partitions (down to 4x4 pixels). This fine-grained partitioning allows the encoder to precisely track the edges of the moving object and isolate the motion from the surrounding static background.

Motion Estimation and Vector Coding

The core of temporal compression relies on predicting where pixels move from one frame to the next. Librav1e treats motion estimation very differently depending on whether the pixels are stationary or in motion.

Static Backgrounds: For static areas, librav1e frequently utilizes “Skip Mode” or zero motion vectors. Instead of calculating and encoding complex movement data, the encoder simply tells the decoder to copy the pixel data directly from a previous reference frame. If the entire camera is static, librav1e can encode vast portions of the background using virtually zero bits.
Moving Objects: For moving elements, librav1e performs motion search algorithms to find where the block of pixels existed in previous or future reference frames. It then calculates a motion vector to describe this displacement. Instead of encoding the entire moving object again, librav1e only encodes the motion vector and the “residual”—the minor difference between the predicted movement and the actual frame.

Reference Frame Usage and Temporal Filtering

AV1 supports up to eight reference frames, including forward, backward, and “Alt-Ref” (alternative reference) frames. Librav1e leverages this architecture to optimize both static and moving elements.

Static Backgrounds: Librav1e relies heavily on Alt-Ref frames for static areas. Alt-Ref frames are constructed by temporally filtering (overlaying and averaging) multiple frames to remove camera noise while keeping static details sharp. Because the background does not move, this filtering creates a highly detailed, noise-free reference that the encoder can reuse over multiple frames.
Moving Objects: Temporal filtering is trickier for moving objects, as blending moving pixels across frames causes motion blur. Librav1e must limit temporal filtering on moving areas and rely on short-term reference frames (immediately preceding or succeeding frames) to accurately predict the object’s path without introducing ghosting artifacts.

Bitrate Allocation and Rate Control

Librav1e uses rate control algorithms to decide how to distribute the available bit budget across a frame.

Static Backgrounds: Because static backgrounds require very little data to maintain visual quality once they are initially encoded, librav1e drastically reduces the bit allocation for these regions in subsequent frames.
Moving Objects: Human eyes are naturally drawn to motion, but motion is also highly prone to compression artifacts like blockiness. Librav1e allocates the majority of the frame’s bit budget to these moving zones, ensuring that the active parts of the video remain sharp and free of distortion.