🚀 The feature, motivation and pitch
Hello, following my meeting with some ExecuTorch maintainers at PyTorch Conference in Paris last week, I want to figure out how to bring support for RISC-V to ExecuTorch as a first-class citizen. cc @GregoryComer @digantdesai @cbilgin @psiddh @AdrianLundell @rascani @mergennachin
I'd like to enable build and test for riscv64 Linux targets, with the goal of running ExecuTorch .pte files on native RISC-V hardware in CI.
The scope for this issue is deliberately narrow: the XNNPACK delegate on RV64GC(V) Linux. Nothing else. Other paths (TOSA-equivalent for RISC-V matrix extensions, Cortex-M analogues, vendor NPU backends) are real but belong in follow-up issues once the XNNPACK path is solid.
XNNPACK is a good starting point because the RISC-V work is already done on its side. RVV microkernels have been upstreamed by SiFive and Imagination. Functionally this is the same position XNNPACK occupies on Cortex-A via KleidiAI, so there's nothing conceptually new to invent on the ExecuTorch side. If the ExecuTorch runtime plus XNNPACK delegate cross-compiles and runs on an RV64GCV board today, we already have a usable CPU path; where it doesn't, the gaps are likely small and concentrated in build glue and low-level deps.
On the CI side, the usual blocker is "we don't have hardware." RISE solves that: we operate free, ephemeral GitHub Actions runners on bare-metal RISC-V hardware, and this runner service reached GA recently. ExecuTorch can plug in the same way any other open-source repo does and validate on real silicon instead of QEMU. I co-chair the RISE TSC and can directly help with onboarding, capacity, and ongoing support.
My goal with this issue is to agree on scope, phasing, and a concrete first milestone. Draft RFC below.
Alternatives
- QEMU-only CI. Unblocks CI cheaply but gives weak signal. QEMU's RVV emulation is not cycle-accurate, catches few hardware-specific bugs, and produces no usable performance numbers. Fine as a fallback, not sufficient as the primary signal for a deployment runtime whose whole point is behaviour on real hardware.
- Downstream fork or vendor-specific build. Creates fragmentation and rots quickly. Several vendor forks exist today in the broader PyTorch ecosystem; we should learn from that and not repeat it here.
Additional context
XNNPACK RISC-V status. RVV-optimized microkernels for f32 GEMM, im2col, packing, and reduction ops are upstream. Known missing pieces include int8 quantized kernels and the f32-igemm / x32-packw microkernels required to enable RVV F32-GEMM by default in gemm-config. (I'm confirming that with contributors of the RVV-enabled kernels to XNNPACK)
Python ecosystem on riscv64 is still being bootstrapped. This affects the AOT side of the ExecuTorch flow. Exporting a .pte requires PyTorch, the ExecuTorch Python package, and their transitive dependencies. On riscv64, conda-forge is still being brought up (tracked at conda-forge/conda-forge.github.io#1744), and pip wheels for PyTorch on riscv64 are not officially published. The runtime side is unaffected: the ExecuTorch C++ runtime and the XNNPACK delegate don't need a Python environment on the target. This naturally splits the work — we can start with AOT on x86_64 hosts and native runtime execution on riscv64 runners, and shift AOT to native hosts later as the conda-forge riscv64 bootstrap completes. I'm directly involved in the conda-forge work on the RISE side and can keep this issue in sync with that progress.
Low-level dependencies. cpuinfo has partial RISC-V support but has historically been fragile across kernel versions (hwprobe header availability, feature detection through /proc/cpuinfo fallback). pthreadpool and build scripts may need small patches. These are known, bounded unknowns rather than architectural issues.
Hardware landscape. RISC-V machines usable for CI today or imminently: Scaleway EM-RV1 (what RISE already runs on), SpacemiT K1 (BananaPi F3), and the upcoming SpacemiT K3 (it is RVA23-compatible). Enough diversity to validate a real CPU path.
RISE CI capability. Native GitHub Actions runners on bare-metal riscv64 are in GA. Free for open-source projects. Integrate via runs-on labels like any other runner. Happy to share operational details and the onboarding process.
RFC (Optional)
Scope. XNNPACK delegate on riscv64 Linux only. All other backends and architectures are out of scope for this issue and will be tracked separately once this foundation is in.
Proposed phasing.
Phase 1 — Runtime build and smoke-test
- AOT on x86_64 host: export a small model (e.g. MobileNetV2) via
examples/xnnpack/aot_compiler.py to produce an .pte file.
- Runtime on riscv64: cross-compile or natively build the ExecuTorch runtime with
EXECUTORCH_BUILD_XNNPACK=ON on a RISE runner. Fix any fallout in cpuinfo, pthreadpool, and build scripts.
- Execute the pre-built
.pte on a RISE runner with executor_runner. Validate output against an x86_64 reference run.
- Deliverable: a
riscv64-linux-xnnpack-runtime CI job that builds the runtime and runs a small model on every PR touching core or XNNPACK.
Phase 2 — End-to-end model matrix
- Extend Phase 1 to a small, representative model set: MobileNetV2 (fp32 and quantized), a transformer block, one LLM smoke test if feasible.
- Correctness gate with numerical tolerances matching existing XNNPACK tests.
Phase 3 — RVV enablement and performance
- Enable RVV microkernels by default in the XNNPACK build when the Vector extension is present.
- Close remaining XNNPACK RVV gaps upstream (int8 GEMM,
f32-igemm, x32-packw) to unblock the corresponding gemm-config entries.
Phase 4 — Native AOT (dependent on conda-forge riscv64)
- Once
linux-riscv64 conda-forge builds of PyTorch and ExecuTorch's Python dependencies are available, move the AOT export step onto a riscv64 host. This is a nice-to-have, not a blocker for Phases 1–3.
Open questions.
- Does ExecuTorch already build on riscv64 today? I can investigate and report back if that's the fastest unblock.
- Preferred runner integration: RISE-provided hosted runners, or self-hosted runners registered against the pytorch org? I'm checking out with the PyTorch multi-cloud TAC to figure out what's the better approach for the PyTorch project as a whole.
- Who on the ExecuTorch side wants to be the review owner for the RISC-V build path? Happy to drive PRs, but a clear reviewer would help.
🚀 The feature, motivation and pitch
Hello, following my meeting with some ExecuTorch maintainers at PyTorch Conference in Paris last week, I want to figure out how to bring support for RISC-V to ExecuTorch as a first-class citizen. cc @GregoryComer @digantdesai @cbilgin @psiddh @AdrianLundell @rascani @mergennachin
I'd like to enable build and test for riscv64 Linux targets, with the goal of running ExecuTorch
.ptefiles on native RISC-V hardware in CI.The scope for this issue is deliberately narrow: the XNNPACK delegate on RV64GC(V) Linux. Nothing else. Other paths (TOSA-equivalent for RISC-V matrix extensions, Cortex-M analogues, vendor NPU backends) are real but belong in follow-up issues once the XNNPACK path is solid.
XNNPACK is a good starting point because the RISC-V work is already done on its side. RVV microkernels have been upstreamed by SiFive and Imagination. Functionally this is the same position XNNPACK occupies on Cortex-A via KleidiAI, so there's nothing conceptually new to invent on the ExecuTorch side. If the ExecuTorch runtime plus XNNPACK delegate cross-compiles and runs on an RV64GCV board today, we already have a usable CPU path; where it doesn't, the gaps are likely small and concentrated in build glue and low-level deps.
On the CI side, the usual blocker is "we don't have hardware." RISE solves that: we operate free, ephemeral GitHub Actions runners on bare-metal RISC-V hardware, and this runner service reached GA recently. ExecuTorch can plug in the same way any other open-source repo does and validate on real silicon instead of QEMU. I co-chair the RISE TSC and can directly help with onboarding, capacity, and ongoing support.
My goal with this issue is to agree on scope, phasing, and a concrete first milestone. Draft RFC below.
Alternatives
Additional context
XNNPACK RISC-V status. RVV-optimized microkernels for f32 GEMM, im2col, packing, and reduction ops are upstream. Known missing pieces include int8 quantized kernels and the
f32-igemm/x32-packwmicrokernels required to enable RVV F32-GEMM by default in gemm-config. (I'm confirming that with contributors of the RVV-enabled kernels to XNNPACK)Python ecosystem on riscv64 is still being bootstrapped. This affects the AOT side of the ExecuTorch flow. Exporting a
.pterequires PyTorch, the ExecuTorch Python package, and their transitive dependencies. On riscv64, conda-forge is still being brought up (tracked at conda-forge/conda-forge.github.io#1744), and pip wheels for PyTorch on riscv64 are not officially published. The runtime side is unaffected: the ExecuTorch C++ runtime and the XNNPACK delegate don't need a Python environment on the target. This naturally splits the work — we can start with AOT on x86_64 hosts and native runtime execution on riscv64 runners, and shift AOT to native hosts later as the conda-forge riscv64 bootstrap completes. I'm directly involved in the conda-forge work on the RISE side and can keep this issue in sync with that progress.Low-level dependencies.
cpuinfohas partial RISC-V support but has historically been fragile across kernel versions (hwprobe header availability, feature detection through/proc/cpuinfofallback).pthreadpooland build scripts may need small patches. These are known, bounded unknowns rather than architectural issues.Hardware landscape. RISC-V machines usable for CI today or imminently: Scaleway EM-RV1 (what RISE already runs on), SpacemiT K1 (BananaPi F3), and the upcoming SpacemiT K3 (it is RVA23-compatible). Enough diversity to validate a real CPU path.
RISE CI capability. Native GitHub Actions runners on bare-metal riscv64 are in GA. Free for open-source projects. Integrate via
runs-onlabels like any other runner. Happy to share operational details and the onboarding process.RFC (Optional)
Scope. XNNPACK delegate on riscv64 Linux only. All other backends and architectures are out of scope for this issue and will be tracked separately once this foundation is in.
Proposed phasing.
Phase 1 — Runtime build and smoke-test
examples/xnnpack/aot_compiler.pyto produce an.ptefile.EXECUTORCH_BUILD_XNNPACK=ONon a RISE runner. Fix any fallout incpuinfo,pthreadpool, and build scripts..pteon a RISE runner withexecutor_runner. Validate output against an x86_64 reference run.riscv64-linux-xnnpack-runtimeCI job that builds the runtime and runs a small model on every PR touching core or XNNPACK.Phase 2 — End-to-end model matrix
Phase 3 — RVV enablement and performance
f32-igemm,x32-packw) to unblock the corresponding gemm-config entries.Phase 4 — Native AOT (dependent on conda-forge riscv64)
linux-riscv64conda-forge builds of PyTorch and ExecuTorch's Python dependencies are available, move the AOT export step onto a riscv64 host. This is a nice-to-have, not a blocker for Phases 1–3.Open questions.