How librav1e Handles Variable Frame Rate Video

This article explains how the Rust-based AV1 encoder, librav1e, processes variable frame rate (VFR) video inputs. It covers how the encoder manages presentation timestamps (PTS), its internal timebase configurations, and how to ensure smooth, synchronized AV1 encodes when dealing with source videos that do not have a constant frame rate.

Timestamp-Based Encoding

Unlike traditional encoders that might assume a fixed interval between frames, rav1e (and its library interface librav1e) is designed around a timestamp-based model. It does not inherently enforce a Constant Frame Rate (CFR). Instead, it relies on the Presentation Timestamps (PTS) provided with each input video frame.

When a frame is passed to librav1e, it must be accompanied by its specific PTS and a defined timebase (the fractional unit of time per tick, such as 1/90000 or 1/1000). The encoder uses these timestamps to determine the exact duration of each frame. This allows librav1e to natively support VFR inputs, as the time delta between frame N and frame N+1 can vary dynamically throughout the video stream.

Rate Control and Lookahead Mechanics

Variable frame rates present a challenge for video rate control algorithms, which often assume a steady stream of data. librav1e addresses this through its lookahead and frame-type decision mechanisms:

Temporal Metric Calculations: The encoder analyzes the actual duration of each frame based on its timestamps to calculate bit distribution. A frame that remains on screen longer (common in VFR content like screen recordings) may be allocated more bits or designated as a keyframe, while rapidly changing frames are compressed accordingly.
Lookahead Buffer: The lookahead queue observes the timestamps of upcoming frames. This ensures that the rate control algorithm does not miscalculate the bitrate when encountering sudden changes in frame duration.

Integration via FFmpeg and External APIs

In most practical workflows, users do not interact with librav1e directly; instead, they use a media framework like FFmpeg, which compiles with librav1e support.

When using FFmpeg, the handling of VFR depends on the input demuxer and the flags passed to the command line:

Passing Timestamps: FFmpeg’s librav1e wrapper maps the input AVFrame timestamps directly to the rav1e frame configuration. If the source file is VFR (such as an smartphone recording or OBS capture), FFmpeg preserves these dynamic timestamps.
Container Muxing: Once librav1e compresses the frames, it outputs packets containing the original PTS values. The container format (typically MKV or MP4) then stores these timestamps. This ensures that the media player decodes and displays the frames at the correct variable intervals, preventing audio desynchronization.

To ensure VFR is preserved during an encode using FFmpeg with librav1e, you should avoid forcing a constant frame rate filter (like -r or fps), allowing the source timestamps to pass through naturally to the encoder.