minkiPy is a Python package for differential analysis of gene spatial organisation in spatial transcriptomics data, using Minkowski functionals and tensors.
This repository accompanies the paper "Differential Analysis of Gene Spatial Organisation with Minkowski Functionals and Tensors" and includes:
- the
minkiPypackage, - a command-line interface,
- an exploratory notebook to get started quickly on your own data,
- full workflow notebooks used for end-to-end analyses.
- Input format
- Method summary
- Installation
- Quick start (Python)
- Command-line usage
- MPI usage patterns
- Repository layout
minkiPy expects a pandas.DataFrame with transcript-level coordinates and these columns:
geneglobal_xglobal_y
import pandas as pd
transcripts_df = pd.DataFrame({
"gene": [...],
"global_x": [...],
"global_y": [...],
})Notes:
geneis a string identifier.global_xandglobal_yshould share the same coordinate system (usually micrometres).- Converting platform-specific files to this format is done upstream.
For each gene, minkiPy reconstructs a spatial density field and computes a profile across level sets.
Each profile contains:
W0(area),W1(boundary length),W2(Euler-characteristic-related term),beta(anisotropy index from a Minkowski tensor).
Profiles are shaped (4, LS) per gene.
Optional Monte Carlo runs estimate covariance. Distances can then be covariance-aware Gaussian 2-Wasserstein, or Euclidean for fast exploration.
These profiles are the starting point for downstream analysis: sample and gene comparisons, condition-level ranking of spatial reorganisation, and embedding/graph analyses.
mpi4pyneeds an MPI runtime (mpirun/mpiexec) installed on your machine.
Before choosing an option:
- Option A (pip from PyPI) does not require cloning this repository.
- Options B/C (YAML or local development) require a local clone first:
git clone https://github.com/BAUDOTlab/minkiPy.git
cd minkiPy- Check MPI:
mpirun --versionIf missing, install MPI first:
- Ubuntu/Debian
sudo apt update sudo apt install -y openmpi-bin libopenmpi-dev
- macOS (Homebrew)
brew install open-mpi
- Conda-only
conda install -c conda-forge openmpi mpi4py
- Update pip tooling:
python -m pip install --upgrade pip setuptools wheel- Install:
pip install minkipy-st- Verify:
python -c "import minkiPy; print('minkiPy import OK')"
python -m minkiPy --helpUse this option from the repository root (after git clone and cd minkiPy).
- Update Conda first:
conda update -n base -c defaults conda- Create the environment:
conda env create -f minkiPy_env.yaml- Activate it:
conda activate minkiPy- Install package from source (editable):
pip install -e .- (Optional) Add a Jupyter kernel:
python -m ipykernel install --user --name minkiPy --display-name "Python (minkiPy)"Use this option from the repository root (after git clone and cd minkiPy).
python -m pip install --upgrade pip setuptools wheel
pip install -e .If installation fails:
- Retry after updating pip tooling:
python -m pip install --upgrade pip setuptools wheel- For Conda setups, also update Conda:
conda update -n base -c defaults conda- Create a clean virtual environment and reinstall:
python -m venv .venv
source .venv/bin/activate # Windows (PowerShell): .venv\Scripts\Activate.ps1
python -m pip install --upgrade pip setuptools wheel
pip install minkipy-st- If MPI errors persist, re-check
mpirun --versionand ensure MPI +mpi4pyare compatible.
import minkiPy
h5_path = minkiPy.compute_Minkowski_profiles(
transcripts_df,
name="sample_A",
output_path="results",
resolution=20.0,
nbr=25,
n_cov_samples=None, # default MC realisations; set 0 for faster exploratory runs
# mpi_procs:
# None -> auto-detect
# 1 -> single process
# >1 -> spawn MPI processes
)Typical output file:
results/minkiPy_merged_resolution_<resolution>_<name>.h5
Example downstream loading:
filepaths = [
"results/minkiPy_merged_resolution_20.0_sample_A.h5",
"results/minkiPy_merged_resolution_20.0_sample_B.h5",
]
ordered_conditions = ["sample_A", "sample_B"]
data = minkiPy.process_data(
filepaths,
ordered_conditions=ordered_conditions,
verbose=True,
)After process_data, typical downstream steps include:
- condition-level averaging with
add_averaged_condition_datasets, - sample or gene distances with
compute_sample_distancesandcompute_gene_distances, - graph and embedding visualisations (
plot_dataset_graphs_from_data,plot_gene_graphs_from_data,plot_pca_grid_by_condition), - differential ranking and trend plots (
plot_top_changing_genes,plot_w2_abslog2fc_with_trend), - profile-level diagnostics (
plot_minkowski_profile,plot_w2_diag_vs_euclid_distributions,plot_w2_diag_vs_full_plus_euclid_distributions).
To get started quickly with your own data, begin with minkiPy_exploratory_workflow.ipynb.
Run under MPI:
mpirun -n 8 python -m minkiPy \
--input transcripts.csv \
--name sample_A \
--output-path results \
--resolution 20 \
--nbr 25Custom column names:
mpirun -n 8 python -m minkiPy \
--input transcripts.tsv \
--sep '\t' \
--gene-col gene_symbol \
--x-col x \
--y-col y \
--name sample_A \
--output-path resultsSupported formats: .csv, .txt, .tsv, .parquet.
Launch your script with mpirun/mpiexec. compute_Minkowski_profiles(...) uses the active MPI communicator.
h5_path = minkiPy.compute_Minkowski_profiles(
transcripts_df,
name="sample_A",
output_path="results",
resolution=20.0,
nbr=25,
mpi_procs=60,
use_hwthreads=True,
)Useful parameters:
mpi_procs(int | None, defaultNone)use_hwthreads(bool, defaultFalse)oversubscribe(bool, defaultFalse)extra_mpirun_args(list[str] | None)
minkiPy/
├── minkiPy/ # Core package
│ ├── minkowski_core.py # Per-gene Minkowski profile computation
│ ├── mpi_driver.py # MPI distribution + auto-MPI wrapper
│ ├── cli.py # Command-line logic
│ ├── io.py # NPZ/HDF5 output writing and merge
│ └── downstream/ # Post-processing, distances, visualisation
├── minkiPy_env.yaml # Conda environment definition
├── minkiPy_exploratory_workflow.ipynb # Introductory exploratory workflow
├── minkiPy_FSHD_complete_workflow.ipynb # Full FSHD workflow
├── minkiPy_CRC_complete_workflow.ipynb # Full CRC workflow
└── examples/ # Data staging for notebooks
