Skip to content

Fix #1308: Scope dask.config.set() to specific operations instead of module import#1309

Open
khushthecoder wants to merge 3 commits intomalariagen:masterfrom
khushthecoder:fix/issue-1308-dask-config-scope
Open

Fix #1308: Scope dask.config.set() to specific operations instead of module import#1309
khushthecoder wants to merge 3 commits intomalariagen:masterfrom
khushthecoder:fix/issue-1308-dask-config-scope

Conversation

@khushthecoder
Copy link
Copy Markdown
Contributor

Summary

Fixes #1308

ag3.py line 10 executes dask.config.set(**{"array.slicing.split_native_chunks": False}) at module import time, which silently disables a dask performance optimization globally for the entire Python process — not just malariagen_data operations.

This PR scopes the config override to context managers (with dask.config.set(...)) within the specific methods that require it, so the config is restored immediately after each operation completes.

Changes

  • malariagen_data/ag3.py: Removed the module-level dask.config.set() call and the now-unused import dask.
  • malariagen_data/util.py: Added scoped dask.config.set() context manager around da.compress() in _da_compress().
  • malariagen_data/anoph/snp_data.py: Added scoped dask.config.set() context managers around:
    • da.compress() in snp_genotypes() (sample filtering)
    • da.take() in snp_genotypes() (sample indexing)
    • da.take() in _locate_site_class() (site annotation subsetting)

Why This Matters

  • No side effects on import: import malariagen_data no longer modifies global dask configuration.
  • No performance degradation: Third-party dask workloads (xarray, pangeo, etc.) in the same session are unaffected.
  • Easier debugging: Researchers combining malariagen_data with other dask tools won't see unexplained behavior changes.
  • Backward-compatible: No changes to public API or user-facing behavior.

Related: #1305 (same class of issue — global state mutation that leaks beyond malariagen_data's scope)

Test Plan

  • All existing tests pass (1245 passed, 10 skipped)
  • ruff check passes on all changed files
  • ruff format passes on all changed files
  • Verified import malariagen_data no longer modifies dask.config

khushthecoder added 3 commits April 17, 2026 22:06
…nstead of module import

Move the `split_native_chunks` config override from module-level in ag3.py
to context managers within the specific methods that require it. This
prevents importing malariagen_data from silently modifying global dask
configuration, which could degrade performance for unrelated dask
workloads in the same Python session.

Affected operations:
- util._da_compress(): wraps da.compress() call
- snp_data.snp_genotypes(): wraps da.compress() and da.take() calls
- snp_data._locate_site_class(): wraps da.take() call
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

dask.config.set() at Module Import Time in ag3.py Silently Modifies Global Dask Configuration

1 participant