Skip to content

Add a workaround to custom_inclusive_scan_over_group#2275

Merged
ndgrigorian merged 3 commits intomasterfrom
add-w/a-to-custom_inclusive_scan_over_group
Apr 9, 2026
Merged

Add a workaround to custom_inclusive_scan_over_group#2275
ndgrigorian merged 3 commits intomasterfrom
add-w/a-to-custom_inclusive_scan_over_group

Conversation

@antonwolfy
Copy link
Copy Markdown
Collaborator

@antonwolfy antonwolfy commented Apr 9, 2026

There is IGC bug identified when using custom_inclusive_scan_over_group inside accumulator SYCL kernels on ARL GPU.
The issue is that IGC incorrectly optimized the code, cause work-item=0 initialized __scan_val with zero instead of identity value.

This PR proposes to implement a workaround which adds SYCL atomic fence. That works since the explicit memory fence prevents reordering/elimination, but leads to slight overhead.

Note, no new test added, since test_tensor_accumulation.py::test_logcumsumexp_basic and test_tensor_accumulation.py::test_cumulative_logsumexp_closed_form[fpdt0] already cover the issue and failed without the w/a in place.

  • Have you provided a meaningful PR description?
  • Have you added a test, reproducer or referred to an issue with a reproducer?
  • Have you tested your changes locally for CPU and GPU devices?
  • Have you made sure that new changes do not introduce compiler warnings?
  • Have you checked performance impact of proposed changes?
  • Have you added documentation for your changes, if necessary?
  • Have you added your changes to the changelog?
  • If this PR is a work in progress, are you opening the PR as a draft?

@antonwolfy antonwolfy self-assigned this Apr 9, 2026
@github-actions
Copy link
Copy Markdown

github-actions bot commented Apr 9, 2026

Deleted rendered PR docs from intelpython.github.com/dpctl, latest should be updated shortly. 🤞

@coveralls
Copy link
Copy Markdown
Collaborator

coveralls commented Apr 9, 2026

Coverage Status

coverage: 86.248%. remained the same — add-w/a-to-custom_inclusive_scan_over_group into master

@github-actions
Copy link
Copy Markdown

github-actions bot commented Apr 9, 2026

Array API standard conformance tests for dpctl=0.22.0dev1=py310h93fe807_52 ran successfully.
Passed: 1248
Failed: 43
Skipped: 89

@github-actions
Copy link
Copy Markdown

github-actions bot commented Apr 9, 2026

Array API standard conformance tests for dpctl=0.22.0dev1=py310h93fe807_53 ran successfully.
Passed: 1246
Failed: 45
Skipped: 89

@antonwolfy antonwolfy marked this pull request as ready for review April 9, 2026 14:32
ndgrigorian
ndgrigorian previously approved these changes Apr 9, 2026
Copy link
Copy Markdown
Collaborator

@ndgrigorian ndgrigorian left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

aside from possibility of changing which memory_order is used this LGTM

@github-actions
Copy link
Copy Markdown

github-actions bot commented Apr 9, 2026

Array API standard conformance tests for dpctl=0.22.0dev1=py310h93fe807_54 ran successfully.
Passed: 1247
Failed: 44
Skipped: 89

@ndgrigorian ndgrigorian merged commit 8b5e3b6 into master Apr 9, 2026
58 of 72 checks passed
@ndgrigorian ndgrigorian deleted the add-w/a-to-custom_inclusive_scan_over_group branch April 9, 2026 16:51
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants