Librav1e Performance: ARM vs x86_64 Processors
This article analyzes the performance of librav1e—the library interface for the rav1e AV1 video encoder—comparing its execution on ARM architectures against x86_64 processors. We examine how assembly optimizations, architectural differences, and power efficiency impact encoding speeds and overall viability on both hardware platforms.
Assembly Optimizations and SIMD Support
The performance of AV1 encoders relies heavily on Single Instruction, Multiple Data (SIMD) assembly optimizations. The rav1e engine, which powers librav1e, is written in Rust and contains hand-written assembly for critical, performance-intensive tasks like motion estimation, quantization, and transforms.
- x86_64 Architecture: On x86_64, librav1e leverages mature AVX2 and AVX-512 instruction sets. Because x86 SIMD optimizations have been established in the rav1e codebase for a longer time, the encoder can fully exploit the wide vector registers of modern Intel and AMD processors.
- ARM Architecture: On ARM64 (AArch64), librav1e utilizes NEON (Advanced SIMD) instructions. While NEON optimizations have improved significantly, they historically trailed the x86 AVX implementations in terms of complete coverage of the encoder’s codebase. However, recent releases have closed this gap, offering highly competitive performance on modern ARM hardware.
Raw Encoding Speed
When comparing raw encoding speed (frames per second), x86_64 processors generally hold an advantage in absolute throughput, particularly in multi-threaded server environments.
- x86_64 Performance: High-end x86_64 desktop and server CPUs benefit from high clock speeds and robust AVX-512 pipelines. In heavy multi-threaded encoding workloads, these processors can process video frames faster than current ARM counterparts, especially at higher speed presets (where fast SIMD execution is critical).
- ARM Performance: ARM processors, such as Apple’s M-series (M1/M2/M3) and AWS Graviton server chips, deliver strong single-core performance. On Apple Silicon, librav1e performs exceptionally well due to the unified memory architecture and wide execution pipelines, often rivaling mid-range x86_64 desktop processors in real-world scenarios.
Energy Efficiency and Cost-Effectiveness
While x86_64 often wins in raw speed, ARM architectures frequently outperform x86_64 in performance-per-watt and cost efficiency.
- Performance-per-Watt: ARM chips consume significantly less power than x86_64 processors during intensive video encoding tasks. For mobile devices, laptops, and edge computing, running librav1e on ARM results in longer battery life and less thermal throttling.
- Cloud Encoding Costs: In cloud computing environments (such as AWS), running librav1e workloads on ARM-based instances (like Graviton) is often more cost-effective. Even if an ARM instance takes slightly longer to complete an encode than a high-end x86_64 instance, the lower hourly cost of ARM hardware frequently results in a lower overall cost per encoded video.
Summary of the Verdict
For maximum encoding throughput where power consumption is not a constraint, x86_64 processors utilizing AVX2 and AVX-512 remain the superior choice for librav1e. However, for cloud deployment, mobile devices, and scenarios where power efficiency and cost-per-encode are the primary metrics, ARM architectures offer a highly competitive and often more economical alternative.