Skip to content

Arm backend: add VGF PT2E linear quantization modes for LLM export#19029

Open
xingguo01 wants to merge 2 commits intopytorch:mainfrom
xingguo01:arm-backend-llm-export
Open

Arm backend: add VGF PT2E linear quantization modes for LLM export#19029
xingguo01 wants to merge 2 commits intopytorch:mainfrom
xingguo01:arm-backend-llm-export

Conversation

@xingguo01
Copy link
Copy Markdown
Collaborator

@xingguo01 xingguo01 commented Apr 21, 2026

  • add vgf_16a8w/8a8w PT2E quantization modes
  • add backend.vgf.quantize_scope for full vs linear VGF quantization
  • wire the VGF config through the LLM export and quantizer selection path
  • add coverage in export_llama_lib tests for the new VGF PT2E modes

cc @digantdesai @freddan80 @per @zingo @oscarandersson8218 @mansnils @Sebastian-Larsson @robell

- add vgf_16a8w PT2E quantization modes
- add backend.vgf.quantize_scope for full
  vs linear VGF quantization
- wire the VGF config through the LLM export
  and quantizer selection path
- add coverage in export_llama_lib tests
  for the new VGF PT2E modes

Signed-off-by: Xingguo Li <xingguo.li@arm.com>
Change-Id: Ie8fe849b4856321308d6d526248a7a4760ddc573
Copilot AI review requested due to automatic review settings April 21, 2026 17:02
@pytorch-bot
Copy link
Copy Markdown

pytorch-bot Bot commented Apr 21, 2026

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/executorch/19029

Note: Links to docs will display an error until the docs builds have been completed.

❌ 14 New Failures, 4 Cancelled Jobs, 2 Unrelated Failures

As of commit 93c91b6 with merge base ccaf17e (image):

NEW FAILURES - The following jobs have failed:

CANCELLED JOBS - The following jobs were cancelled. Please retry:

BROKEN TRUNK - The following jobs failed but were present on the merge base:

👉 Rebase onto the `viable/strict` branch to avoid these failures

This comment was automatically generated by Dr. CI and updates every 15 minutes.

@meta-cla meta-cla Bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Apr 21, 2026
@xingguo01 xingguo01 added partner: arm For backend delegation, kernels, demo, etc. from the 3rd-party partner, Arm ciflow/trunk release notes: arm Changes to the ARM backend delegate labels Apr 21, 2026
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds Arm VGF backend PT2E quantization support for LLM export, including a new 16a8w mode gated on INT16 TOSA extension support and a configurable quantization scope (full-model vs Linear-only), plus test coverage for the new behavior.

Changes:

  • Add vgf_16a8w PT2E quantization mode and enforce INT16 compile spec extension when selected.
  • Introduce backend.vgf.quantize_scope (full vs linear) and apply it when constructing the VGF quantizer.
  • Wire new VGF settings through llama export CLI/config and add unit tests for scope + INT16 gating.

Reviewed changes

Copilot reviewed 4 out of 4 changed files in this pull request and generated 1 comment.

File Description
extension/llm/export/quantizer_lib.py Extends VGF quantizer selection for vgf_16a8w and adds scope-based application (global vs Linear-only).
extension/llm/export/config/llm_config.py Adds vgf_16a8w enum value and introduces VgfQuantizeScope + config wiring from CLI args.
examples/models/llama/export_llama_lib.py Exposes VGF PT2E modes and VGF scope/compile-spec CLI flags; passes scope into VGF quantizer creation.
examples/models/llama/tests/test_export_llama_lib.py Adds coverage for VGF linear-only scope and INT16 compile spec enforcement for vgf_16a8w.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

"vgf_8a8w",
"vgf_16a8w",
],
help="Use PT2E quantization. Comma separated options. e.g. xnnpack_dynamic (for per channel 8 bit weight), xnnpack_dynamic_qc4 (for per channel 4 bit weight), embedding.",
Copy link

Copilot AI Apr 21, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The --pt2e_quantize argparse option is defined with a fixed set of choices, so it only accepts a single value, but the help text says it supports "Comma separated options" (and even mentions embedding, which is not a valid choice). This is user-facing and likely to confuse; either update the help text to reflect single-choice behavior, or switch the argument parsing to accept a comma-separated list (and adjust LlmConfig/Pt2eQuantize parsing accordingly).

Suggested change
help="Use PT2E quantization. Comma separated options. e.g. xnnpack_dynamic (for per channel 8 bit weight), xnnpack_dynamic_qc4 (for per channel 4 bit weight), embedding.",
help="Use a single PT2E quantization mode, e.g. xnnpack_dynamic (per-channel 8-bit weight) or xnnpack_dynamic_qc4 (per-channel 4-bit weight).",

Copilot uses AI. Check for mistakes.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ciflow/trunk CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. partner: arm For backend delegation, kernels, demo, etc. from the 3rd-party partner, Arm release notes: arm Changes to the ARM backend delegate

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants