[RFC] First-class RISC-V support in ExecuTorch

### 🚀 The feature, motivation and pitch

Hello, following my meeting with some ExecuTorch maintainers at PyTorch Conference in Paris last week, I want to figure out how to bring support for RISC-V to ExecuTorch as a first-class citizen. cc @GregoryComer @digantdesai @cbilgin @psiddh @AdrianLundell @rascani @mergennachin

I'd like to enable build and test for riscv64 Linux targets, with the goal of running ExecuTorch `.pte` files on native RISC-V hardware in CI.

The scope for this issue is deliberately narrow: the XNNPACK delegate on RV64GC(V) Linux. Nothing else. Other paths (TOSA-equivalent for RISC-V matrix extensions, Cortex-M analogues, vendor NPU backends) are real but belong in follow-up issues once the XNNPACK path is solid.

XNNPACK is a good starting point because the RISC-V work is already done on its side. RVV microkernels have been upstreamed by SiFive and Imagination. Functionally this is the same position XNNPACK occupies on Cortex-A via KleidiAI, so there's nothing conceptually new to invent on the ExecuTorch side. If the ExecuTorch runtime plus XNNPACK delegate cross-compiles and runs on an RV64GCV board today, we already have a usable CPU path; where it doesn't, the gaps are likely small and concentrated in build glue and low-level deps.

On the CI side, the usual blocker is "we don't have hardware." RISE solves that: [we operate free, ephemeral GitHub Actions runners on bare-metal RISC-V hardware](https://riseproject-dev.github.io/riscv-runner/), and this runner service reached GA recently. ExecuTorch can plug in the same way any other open-source repo does and validate on real silicon instead of QEMU. I co-chair the RISE TSC and can directly help with onboarding, capacity, and ongoing support.

My goal with this issue is to agree on scope, phasing, and a concrete first milestone. Draft RFC below.

### Alternatives

1. **QEMU-only CI.** Unblocks CI cheaply but gives weak signal. QEMU's RVV emulation is not cycle-accurate, catches few hardware-specific bugs, and produces no usable performance numbers. Fine as a fallback, not sufficient as the primary signal for a deployment runtime whose whole point is behaviour on real hardware.
2. **Downstream fork or vendor-specific build.** Creates fragmentation and rots quickly. Several vendor forks exist today in the broader PyTorch ecosystem; we should learn from that and not repeat it here.

### Additional context

**XNNPACK RISC-V status.** RVV-optimized microkernels for f32 GEMM, im2col, packing, and reduction ops are upstream. Known missing pieces include int8 quantized kernels and the `f32-igemm` / `x32-packw` microkernels required to enable RVV F32-GEMM by default in gemm-config. (I'm confirming that with contributors of the RVV-enabled kernels to XNNPACK)

**Python ecosystem on riscv64 is still being bootstrapped.** This affects the AOT side of the ExecuTorch flow. Exporting a `.pte` requires PyTorch, the ExecuTorch Python package, and their transitive dependencies. On riscv64, conda-forge is still being brought up (tracked at [conda-forge/conda-forge.github.io#1744](https://github.com/conda-forge/conda-forge.github.io/issues/1744)), and pip wheels for PyTorch on riscv64 are not officially published. **The runtime side is unaffected**: the ExecuTorch C++ runtime and the XNNPACK delegate don't need a Python environment on the target. This naturally splits the work — we can start with AOT on x86_64 hosts and native runtime execution on riscv64 runners, and shift AOT to native hosts later as the conda-forge riscv64 bootstrap completes. I'm directly involved in the conda-forge work on the RISE side and can keep this issue in sync with that progress.

**Low-level dependencies.** `cpuinfo` has partial RISC-V support but has historically been fragile across kernel versions (hwprobe header availability, feature detection through `/proc/cpuinfo` fallback). `pthreadpool` and build scripts may need small patches. These are known, bounded unknowns rather than architectural issues.

**Hardware landscape.** RISC-V machines usable for CI today or imminently: Scaleway EM-RV1 (what RISE already runs on), SpacemiT K1 (BananaPi F3), and the upcoming SpacemiT K3 (it is RVA23-compatible). Enough diversity to validate a real CPU path.

**RISE CI capability.** Native GitHub Actions runners on bare-metal riscv64 are in GA. Free for open-source projects. Integrate via `runs-on` labels like any other runner. Happy to share operational details and the onboarding process.

### RFC (Optional)

**Scope.** XNNPACK delegate on riscv64 Linux only. All other backends and architectures are out of scope for this issue and will be tracked separately once this foundation is in.

**Proposed phasing.**

**Phase 1 — Runtime build and smoke-test**
- AOT on x86_64 host: export a small model (e.g. MobileNetV2) via `examples/xnnpack/aot_compiler.py` to produce an `.pte` file.
- Runtime on riscv64: cross-compile or natively build the ExecuTorch runtime with `EXECUTORCH_BUILD_XNNPACK=ON` on a RISE runner. Fix any fallout in `cpuinfo`, `pthreadpool`, and build scripts.
- Execute the pre-built `.pte` on a RISE runner with `executor_runner`. Validate output against an x86_64 reference run.
- Deliverable: a `riscv64-linux-xnnpack-runtime` CI job that builds the runtime and runs a small model on every PR touching core or XNNPACK.

**Phase 2 — End-to-end model matrix**
- Extend Phase 1 to a small, representative model set: MobileNetV2 (fp32 and quantized), a transformer block, one LLM smoke test if feasible.
- Correctness gate with numerical tolerances matching existing XNNPACK tests.

**Phase 3 — RVV enablement and performance**
- Enable RVV microkernels by default in the XNNPACK build when the Vector extension is present.
- Close remaining XNNPACK RVV gaps upstream (int8 GEMM, `f32-igemm`, `x32-packw`) to unblock the corresponding gemm-config entries.

**Phase 4 — Native AOT (dependent on conda-forge riscv64)**
- Once `linux-riscv64` conda-forge builds of PyTorch and ExecuTorch's Python dependencies are available, move the AOT export step onto a riscv64 host. This is a nice-to-have, not a blocker for Phases 1–3.

**Open questions.**
1. Does ExecuTorch already build on riscv64 today? I can investigate and report back if that's the fastest unblock.
2. Preferred runner integration: RISE-provided hosted runners, or self-hosted runners registered against the pytorch org? I'm checking out with the PyTorch multi-cloud TAC to figure out what's the better approach for the PyTorch project as a whole.
3. Who on the ExecuTorch side wants to be the review owner for the RISC-V build path? Happy to drive PRs, but a clear reviewer would help.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[RFC] First-class RISC-V support in ExecuTorch #18991

🚀 The feature, motivation and pitch

Alternatives

Additional context

RFC (Optional)

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[RFC] First-class RISC-V support in ExecuTorch #18991

Description

🚀 The feature, motivation and pitch

Alternatives

Additional context

RFC (Optional)

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions