Skip to content

Fix Adam subgroup inconsistency#7982

Merged
delock merged 3 commits intodeepspeedai:masterfrom
st-bang97:master
Apr 23, 2026
Merged

Fix Adam subgroup inconsistency#7982
delock merged 3 commits intodeepspeedai:masterfrom
st-bang97:master

Conversation

@st-bang97
Copy link
Copy Markdown
Contributor

Fix CPUAdam same-step subgroup drift in ZeRO-3 (#7819)

This PR ports the fix from #7820 to the latest DeepSpeed version.

It makes Adam_Optimizer::IncrementStep idempotent for repeated calls at the same logical step and avoids unnecessary recomputation when the step has not changed.

ZeRO-3/SuperOffload can invoke multiple subgroup updates within a single logical step on a shared native optimizer object. The previous logic mixed multiply and recompute paths, producing non-bit-identical bias-correction metadata across subgroup calls.

This change aligns the step-transition logic in both the CPU and XPU headers, clarifies first-step and non-sequential-step behavior, and prevents unnecessary work on repeated same-step updates.

It also adds CPUAdam regression tests covering subgroup-style repeated same-step updates through both step_subgroup() and step() with parameter swapping.

@st-bang97 st-bang97 marked this pull request as draft April 21, 2026 05:46
@st-bang97 st-bang97 marked this pull request as ready for review April 21, 2026 05:51
@delock
Copy link
Copy Markdown
Collaborator

delock commented Apr 21, 2026

Hi @st-bang97 can you fix formatting? Thanks!

Signed-off-by: st_bang <st.bang@dgist.ac.kr>
@st-bang97
Copy link
Copy Markdown
Contributor Author

@delock maybe i fix the formatting right? if not, could you tell me which code i need to fix the format?

@delock
Copy link
Copy Markdown
Collaborator

delock commented Apr 22, 2026

@delock maybe i fix the formatting right? if not, could you tell me which code i need to fix the format?

Hi @st-bang97 you can refer to this link for the formatting error.
Also use the following method to check formatting error, ususally yapf and flake8 will fix formatting automatically during this process:

  • Verify changed files pass pre-commit checks before committing: pre-commit run --files <changed_files>.

Signed-off-by: st_bang <st.bang@dgist.ac.kr>
@st-bang97
Copy link
Copy Markdown
Contributor Author

@delock Thanks for your help. I edited the format of PR.

@delock delock merged commit 44c51e3 into deepspeedai:master Apr 23, 2026
9 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants