How librav1e Handles Variable Frame Rate Video
This article explains how the Rust-based AV1 encoder,
librav1e, processes variable frame rate (VFR) video inputs.
It covers how the encoder manages presentation timestamps (PTS), its
internal timebase configurations, and how to ensure smooth, synchronized
AV1 encodes when dealing with source videos that do not have a constant
frame rate.
Timestamp-Based Encoding
Unlike traditional encoders that might assume a fixed interval
between frames, rav1e (and its library interface
librav1e) is designed around a timestamp-based model. It
does not inherently enforce a Constant Frame Rate (CFR). Instead, it
relies on the Presentation Timestamps (PTS) provided with each input
video frame.
When a frame is passed to librav1e, it must be
accompanied by its specific PTS and a defined timebase (the fractional
unit of time per tick, such as 1/90000 or 1/1000). The encoder uses
these timestamps to determine the exact duration of each frame. This
allows librav1e to natively support VFR inputs, as the time
delta between frame N and frame N+1 can vary dynamically throughout the
video stream.
Rate Control and Lookahead Mechanics
Variable frame rates present a challenge for video rate control
algorithms, which often assume a steady stream of data.
librav1e addresses this through its lookahead and
frame-type decision mechanisms:
- Temporal Metric Calculations: The encoder analyzes the actual duration of each frame based on its timestamps to calculate bit distribution. A frame that remains on screen longer (common in VFR content like screen recordings) may be allocated more bits or designated as a keyframe, while rapidly changing frames are compressed accordingly.
- Lookahead Buffer: The lookahead queue observes the timestamps of upcoming frames. This ensures that the rate control algorithm does not miscalculate the bitrate when encountering sudden changes in frame duration.
Integration via FFmpeg and External APIs
In most practical workflows, users do not interact with
librav1e directly; instead, they use a media framework like
FFmpeg, which compiles with librav1e support.
When using FFmpeg, the handling of VFR depends on the input demuxer and the flags passed to the command line:
- Passing Timestamps: FFmpeg’s
librav1ewrapper maps the input AVFrame timestamps directly to therav1eframe configuration. If the source file is VFR (such as an smartphone recording or OBS capture), FFmpeg preserves these dynamic timestamps. - Container Muxing: Once
librav1ecompresses the frames, it outputs packets containing the original PTS values. The container format (typically MKV or MP4) then stores these timestamps. This ensures that the media player decodes and displays the frames at the correct variable intervals, preventing audio desynchronization.
To ensure VFR is preserved during an encode using FFmpeg with
librav1e, you should avoid forcing a constant frame rate
filter (like -r or fps), allowing the source
timestamps to pass through naturally to the encoder.