Every modern ARM SoC ships with a dedicated video decoder block sitting next to the CPU cores. On the Rockchip side: rkvdec (RK3399 — H.264, HEVC, VP9), plus hantro-vpu (RK3399 / RK3568 / RK3588 — MPEG-2, VP8, plus a separate RK3588 instance for AV1). On the Broadcom side: a small HEVC decoder block behind rpi-hevc-decon the Pi 5 / CM5. None of these are exciting silicon. They’re 2018-vintage fixed-function pipelines mostly bought for the box checking. But they’re there, they cost a tenth of a watt to wake up, and they decode video.
For about six months our fleet shipped with them mostly idle. The ARM cores chewed software decode and overheated for the privilege.
This is a status note on getting them to do their job.
The current state, eight hours into a session
| Device | SoC | What the silicon decodes today | How |
|---|---|---|---|
| Pinebook Pro | RK3399 | H.264, HEVC, VP8, VP9, MPEG-2 — including H.264 Hi10P + HEVC Main10 | rkvdec + hantro via libva-v4l2-request-fourier |
| CoolPi CM5 GenBook | RK3588 | H.264, HEVC, VP8, VP9, MPEG-2, AV1 | rkvdec (VDPU381 H.264/HEVC/VP9) + hantro (legacy + dedicated AV1 instance) |
| PineTab2 | RK3566 | H.264, MPEG-2, VP8 | hantro |
| Pi 5 CM5 portable | BCM2712 | HEVC | rpi-hevc-dec (kernel side at v4 since July 2025, still not in mainline as of May 2026) |
Each cell required either a kernel patch, a userspace plumbing fix, or both. Some still don’t work and probably never will. Two stories from today’s iteration:
Story one: H.264 Hi10P on the RK3399 silicon „didn’t decode“
The Pinebook Pro’s RK3399 advertises Hi10P (10-bit H.264) in its kernel ctrl table. Decode-empirically, it produced uniform-black frames. The phase-7 close note said „kernel advertisement is aspirational“ — possibly the silicon doesn’t actually do this, possibly Rockchip’s marketing got ahead of the hardware.
Today’s empirical retest: the first five frames of the test fixture are a fade-in. The decode was correct; it was decoding correctly to black, because the source was black. Seek past frame six and the silicon produces bit-exact-correct decoded frames.
Lesson learned: Do not use Big Buck Bunny intro frames for bit-exactness verification. Future contributors are encouraged to verify they are not testing the title card.
The actual bug was elsewhere. FFmpeg’s Kwiboo n8.1 V4L2 hwaccel maps the RK3399 10-bit NV15 capture format to AV_PIX_FMT_YUV420P10 but then deliberately blanks the transfer-format list — the author expected consumers to call av_hwframe_map and chain a swscale unpack, but hwdownload doesn’t do that chaining. Result: kernel decodes fine, userspace never sees the bytes.
The fix is sixty lines: implement NV15→P010 unpack inside v4l2request_transfer_data_from, advertise P010 in transfer_get_formats.
Shipped as patch 0002 on ffmpeg-v4l2-request-fourier pkgrel=5. Twenty-frame mid-fixture decode is now byte-identical to libavcodec software reference. The Phase 5 reviewer suggested a hardening guard for malformed callers, which is now also in.
If you have an anime fansub collection from 2014, the Pinebook Pro now decodes it in hardware. We assume this is a niche use case.
Story two: VP9 on RK3588 took eight months of upstream review
VP9 hardware decode on the RK3588’s VDPU381 block was implemented by Detlev Casanova and the Collabora team; the kernel side merged into Linux 7.0 mainline in February 2026. AV1 support followed in the same window. The actual RK3588 H.264 + HEVC + AV1 path is mainline as of this writing.
VP9 — different story. Same hardware, same block; AV1 had to land first apparently for political reasons we won’t pretend to understand. As of May 2026 the VP9-on-VDPU381 work is on an out-of-tree fork maintained by D.V.A.B. Sarma. Collabora’s own blog post says „we hope to send a v1 to linux-media soon.“ It is currently v8. v1 has not been sent.
We grabbed the three patches as kernel-agentscope-tagged additions to
fleet/ampere.yaml, rebuilt the kernel package, and verified: twenty-frame VP9 decode via the rkvdec block is byte-identical to the software reference.
The new linux-ampere-fourier-7.0rc3.kafr2-1 kernel package ships with these patches integrated. Available now at packages.reauktion.de/arch/aarch64/ for any RK3588 host on the [marfrit] repo.
What the numbers actually look like
| Workload | CPU on ARM cores (software decode) | CPU on ARM cores (VPU-offloaded) | CPU freed |
|---|---|---|---|
| 1080p30 H.264, RK3399 (2×A72+4×A53) | 60–80% on a single fast core | ~5% (mostly demux + audio) | One A72 core’s worth |
| 1080p30 VP9, RK3588 (4×A76+4×A55) | 25–45% spread across cores | ~3% | ~one A76 core, fanning out |
| 1080p24 AV1, RK3588 | 30–60% (8-core spread, dav1d) | „speed=1.36x“ through hwdownload, fanless |
Several A76 + A55 cores |
| 4K HEVC, RK3399 | basically doesn’t | bit-exact, fanless | All eight cores‘ worth in the limit |
The watt numbers are the same shape. Software VP9 at 1080p draws roughly four to seven watts of CPU on the Pi 5 / RK3588 class of SoC, sustained. Hardware decode of the same content draws under one watt. For a battery-powered ARM laptop that’s a three-to-five-times session-time improvement; for the global Pi 5 install base it’s electrons that were being spent on YouTube reviews of Apple products and are now not.
Whether that matters depends on your stance on YouTube reviews of Apple products.
What’s still software-decoded, on purpose
A few codec/host combinations are scoped out and we recommend they stay scoped out:
- HEVC on RK3588. Fixable in principle — there’s a kernel oops in
rkvdec_hevc_prepare_hw_st_rpsfrom an uninitialized stack variable that has a one-line fix; the libva side needs another small patch to populate the EXT_SPS_RPS controls. We have all three issues filed, all three closedEWONTFIX. Reason: the fleet doesn’t use HEVC for anything we haven’t already covered. YouTube doesn’t serve HEVC. Netflix serves HEVC but only behind Widevine L1, which we don’t have on ARM, and the
silicon for the one HEVC use case we have (archive playback on the RK3399 laptop) already works. - H.264 Hi10P on the libva path (RK3588). Decode-side libva ctrl-submission is byte-different from ffmpeg’s; the kernel rejects with
V4L2_BUF_FLAG_ERROR. Fixable with a two-to-four-hour debug session. The kdirect (ffmpeg -hwaccel v4l2request) path works bit-exact. Anyone who needs Hi10P decode through a browser-side libva consumer on RK3588 can speak up; nobody has. The use case for ten-bit H.264 is currently anime fansubs. The use case for libva-routed ten-bit H.264 anime fansubs is even narrower. - HEVC on the Raspberry Pi 5. Kernel side is ready — Jonas Karlman’s
rpi-hevc-decdriver has been on the linux-media patchwork at v4 since July 2025. It is still not merged. The V4L2 stateless HEVC controls don’t carry enough of the bitstream’s syntax fields for our backend to populate them losslessly; the strict driver rejects what the lenient drivers accept. We forked a userspace path calleddaedalus-fourierthat toys with running codec firmware on the VideoCore VII programmable cores (the QPUs Broadcom marketed as a GPU but never shipped vendor codec firmware for, possibly because the licensing math at scale stopped making sense). Wax-and-feathers research-track. Don’t hold your breath.
Architecturally
A working hardware decode stack on the Rockchip / Pi 5 fleet has the following layers:
┌─────────────────────┐
│ firefox / chromium │ ← VAAPI consumer
│ / mpv vaapi / │ (browser does the demux,
│ ffmpeg-vaapi │ hands a frame at a time)
└──────────┬──────────┘
│ libva ABI
┌──────────▼──────────────────┐
│ libva-v4l2-request-fourier │ ← us
│ (our libva backend) │
└──────────┬──────────────────┘
│ V4L2 Request API
┌──────────▼──────────────────┐
│ rkvdec / hantro-vpu / │ ← Rockchip kernel driver
│ rpi-hevc-dec │
└──────────┬──────────────────┘
│ register pokes
┌──────────▼──────────────────┐
│ Silicon: VDPU381 / VEPU121 │ ← The 2018 fixed
│ / hantro G1 / VideoCore │ function block
└─────────────────────────────┘
Each layer required a separate fork. Each fork is shipped as a package in marfrit-packages.
The kernels are linux-fresnel-fourier(RK3399) / linux-ampere-fourier(RK3588); the userspace side ships as libva-v4l2-request-fourier+ ffmpeg-v4l2-request-fourier+ mpv-fourier+ firefox-fourier+ chromium-fourier. None of these forks would be necessary if upstream landed faster. We did not say upstream is unmotivated. We are saying we wanted hardware decode in 2024 and have hardware decode in 2026.
Install
# Arch / ALARM with the marfrit overlay enabled:
sudo pacman -Syu linux-ampere-fourier # RK3588 (CoolPi GenBook)
sudo pacman -Syu linux-fresnel-fourier # RK3399 (Pinebook Pro)
sudo pacman -Syu ffmpeg-v4l2-request-fourier # any RK
sudo pacman -Syu libva-v4l2-request-fourier # any RK
sudo pacman -Syu firefox-fourier mpv-fourier # consumers
Browser-side, set LIBVA_DRIVER_NAME=v4l2_request in the environment, set the three patched-in media.* prefs in about:config (or just install firefox-fourierwhich has them baked).
Repo setup at packages.reauktion.de; pacman config snippets in the marfrit-packagesREADME.
… done.
The fleet now decodes everything it usefully consumes in hardware. The Cortex-A76 cores can go back to compiling kernels, running language models, or doing nothing — they’re rated for it. The electrons formerly used for software video decode are now available for tasks of marginally greater social value, like playing back nerd reviews of overpriced laptops.
Hardware acceleration is supposed to be a solved problem. It mostly isn’t. But it’s a little more solved than it was this morning.
If you’re running Rockchip silicon on Linux and want hardware decode that actually works, the packages above are public. File issues at the relevant git.reauktion.de/marfrit/* repo.
Patches welcome; preferably for the codecs we said we don’t need.