Conversation
| from typing import Optional | ||
|
|
||
|
|
||
| def run_command_and_stream_stdout(cmd: str) -> Optional[int]: |
There was a problem hiding this comment.
Removed the custom streaming helper — subprocess.run without a stdout argument already streams output to the terminal in real time (it just inherits the parent's stdout). The old approach using stdout=PIPE was actually worse since piped subprocesses buffer their output more aggressively. We also get free failure detection via check=True, which the old function didn't provide (it returned the exit code but never checked it).
- Switch from clang-tidy to clangd --check for ~10x faster lint - Add scripts/run_cpp_lint.py: parallel clangd checks with clean output - Add scripts/generate_compile_commands.py: standalone compile DB generation for IDE setup and as a dependency of fix_lint_cpp - Add scripts/_cpp_config.py: single source of truth for compile flags - Fix clang++-15 hardcode; use clang++ resolved via update-alternatives - Add fix_lint_cpp Makefile target for manual clang-tidy --fix runs - Move check_lint_cpp out of type_check and into lint_test where it belongs - Disable misc-confusable-identifiers (was 70% of lint time) and readability-function-cognitive-complexity in .clang-tidy
kmontemayor2-sc
left a comment
There was a problem hiding this comment.
Thanks Matt! Nice work, did a pass here :)
| ## Formatting (`.clang-format`) | ||
|
|
||
| The style is based on LLVM with the following notable deviations: | ||
|
|
||
| ### Line length | ||
|
|
||
| ``` | ||
| ColumnLimit: 120 | ||
| ``` | ||
|
|
||
| 120 columns rather than the LLVM default of 80. ML and graph code tends to have longer identifiers and nested template | ||
| types; 120 gives enough room without forcing awkward wraps. | ||
|
|
||
| ### Indentation and braces | ||
|
|
||
| ``` | ||
| IndentWidth: 4 | ||
| BreakBeforeBraces: Attach # K&R / "same-line" style | ||
| UseTab: Never | ||
| IndentCaseLabels: true # case labels indented inside switch | ||
| NamespaceIndentation: None # namespace bodies not indented | ||
| ``` | ||
|
|
||
| ### Pointer and reference alignment | ||
|
|
||
| ``` | ||
| PointerAlignment: Left | ||
| ``` | ||
|
|
||
| Pointers bind to the type, not the name: `int* x`, not `int *x`. | ||
|
|
||
| ### Parameter and argument wrapping | ||
|
|
||
| ``` | ||
| BinPackArguments: false | ||
| BinPackParameters: false | ||
| ``` | ||
|
|
||
| When a function call or declaration doesn't fit on one line, every argument/parameter gets its own line. Mixed | ||
| "bin-packing" (some on one line, some wrapped) is not allowed. | ||
|
|
||
| ### Templates | ||
|
|
||
| ``` | ||
| AlwaysBreakTemplateDeclarations: true | ||
| ``` | ||
|
|
||
| `template <...>` always appears on its own line, keeping the declaration signature visually separate from the template | ||
| header. | ||
|
|
||
| ### Include ordering | ||
|
|
||
| Includes are sorted and split into three priority groups: | ||
|
|
||
| | Priority | Pattern | Group | | ||
| | -------- | ------------------------------------ | ------------------------------------- | | ||
| | 1 | `.*` | Local project headers (first) | | ||
| | 2 | `^"(llvm\|llvm-c\|clang\|clang-c)/"` | LLVM/Clang internal headers | | ||
| | 3 | `^(<\|"(gtest\|isl\|json)/)` | System and third-party headers (last) | | ||
|
|
||
| ### Raw string formatting | ||
|
|
||
| Raw string literals with the `pb` delimiter (e.g. `R"pb(...)pb"`) are formatted as TextProto using Google style, | ||
| matching the protobuf idiom used throughout the codebase. | ||
|
|
||
| ______________________________________________________________________ | ||
|
|
||
| ## Static Analysis (`.clang-tidy`) | ||
|
|
||
| ### Check philosophy | ||
|
|
||
| A broad set of check families is enabled to catch bugs, enforce modern C++ idioms, and maintain readability. All | ||
| warnings are errors — there is no "warning-only" category. | ||
|
|
||
| Enabled families: | ||
|
|
||
| | Family | What it covers | | ||
| | --------------------------- | -------------------------------------------------------------------------------------------------------------------------------------- | | ||
| | `boost-use-to-string` | Prefer `std::to_string` over `boost::lexical_cast` for numeric conversions | | ||
| | `bugprone-*` | Common programming mistakes: dangling handles, suspicious string construction, assert side effects, etc. | | ||
| | `cert-*` | CERT secure coding rules for error handling (`err34-c`), floating-point loops (`flp30-c`), and RNG seeding (`msc32-c`, `msc50/51-cpp`) | | ||
| | `clang-diagnostic-*` | Compiler diagnostic warnings surfaced as lint checks (e.g. `-Wall`, `-Wextra` violations) | | ||
| | `cppcoreguidelines-*` | C++ Core Guidelines: no raw `malloc`, no union member access, no object slicing, safe downcasts | | ||
| | `google-*` | Google C++ style: explicit constructors, no global names in headers, safe `memset` usage | | ||
| | `hicpp-exception-baseclass` | All thrown exceptions must derive from `std::exception` | | ||
| | `misc-*` | Miscellaneous: header-only definitions, suspicious enum usage, throw-by-value/catch-by-reference, etc. | | ||
| | `modernize-*` | Modernize to C++11/14/17: `nullptr`, range-based for, `make_unique`, `using` aliases, etc. | | ||
| | `performance-*` | Unnecessary copies, inefficient string ops, missed `emplace`, type promotions in math functions | | ||
| | `readability-*` | Naming conventions, braces around statements, boolean simplification, function size limits | | ||
|
|
||
| ### Disabled checks | ||
|
|
||
| Some checks in the above families are disabled where they produce excessive noise or conflict with common patterns in | ||
| this codebase: | ||
|
|
||
| | Disabled check | Reason | | ||
| | ----------------------------------------------------- | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------ | | ||
| | `bugprone-easily-swappable-parameters` | Tensor and sampler APIs legitimately have many adjacent same-typed parameters | | ||
| | `bugprone-implicit-widening-of-multiplication-result` | Crashes clang-tidy 15 on a construct in `ATen/core/dynamic_type.h` (upstream LLVM bug). Re-enable when upgrading past clang-tidy 15. | | ||
| | `bugprone-narrowing-conversions` | Too noisy in ML code mixing `int`/`int64_t`/`size_t` for tensor dimensions | | ||
| | `misc-confusable-identifiers` | Performs an O(n²) comparison of all identifiers in scope to detect Unicode homoglyphs. PyTorch headers introduce thousands of identifiers, making this check account for ~70% of total lint time. All identifiers in this codebase are standard ASCII. | | ||
| | `misc-const-correctness` | Produces false positives with pybind11 types whose mutation happens through `operator[]` (which is non-const). The check incorrectly suggests `const` on variables that are mutated. | | ||
| | `misc-no-recursion` | Recursive graph algorithms are intentional | | ||
| | `modernize-avoid-c-arrays` | C arrays are needed for pybind11 and C-interop code | | ||
| | `modernize-use-trailing-return-type` | Trailing return types (`auto f() -> T`) are only useful when the return type depends on template params. Requiring them everywhere is non-standard and reduces readability. | | ||
| | `readability-avoid-const-params-in-decls` | Incorrectly fires on `const T&` parameters in multi-line declarations (clang-tidy 15 bug). The check is meant for top-level const on by-value params, which is a separate, valid concern. | | ||
| | `readability-container-contains` | `.contains()` requires C++20; the codebase builds with C++17 | | ||
| | `readability-identifier-length` | Short loop variables (`i`, `j`, `k`) are idiomatic | | ||
| | `readability-function-cognitive-complexity` | Algorithmic code often requires nesting that is inherent to the problem structure. Enforcing an arbitrary complexity ceiling discourages clarity and encourages artificial decomposition. | | ||
| | `readability-magic-numbers` | Literal constants are common in ML code (e.g. feature dimensions) | | ||
|
|
||
| ### Naming conventions | ||
|
|
||
| Enforced via `readability-identifier-naming`: | ||
|
|
||
| | Identifier kind | Convention | Example | | ||
| | --------------------------------------------------------- | --------------------------- | ----------------- | | ||
| | Classes, enums, unions | `CamelCase` | `DistDataset` | | ||
| | Type template parameters | `CamelCase` | `NodeType` | | ||
| | Functions, methods | `camelBack` | `sampleNeighbors` | | ||
| | Variables, parameters, members | `camelBack` | `numNodes` | | ||
| | Private/protected members | `camelBack` with `_` prefix | `_nodeFeatures` | | ||
| | Constants (`constexpr`, `const` globals, class constants) | `CamelCase` with `k` prefix | `kMaxBatchSize` | | ||
|
|
||
| ### Key option tuning | ||
|
|
||
| | Option | Value | Effect | | ||
| | ---------------------------------------------------------- | ----------------- | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | | ||
| | `WarningsAsErrors` | `*` | Every check failure is a hard error in CI | | ||
| | `HeaderFilterRegex` | `.*/gigl/csrc/.*` | Scopes checks to our own headers. Using `.*` causes clang-tidy to report warnings from every PyTorch/pybind11 header it parses, flooding output with thousands of third-party issues. | | ||
| | `FormatStyle` | `none` | clang-tidy does not auto-reformat; use clang-format separately | | ||
| | `bugprone-string-constructor.LargeLengthThreshold` | `8388608` (8 MB) | Strings larger than 8 MB from a length argument are flagged | | ||
| | `modernize-loop-convert.NamingStyle` | `CamelCase` | Auto-generated loop variable names use CamelCase | | ||
| | `readability-function-size.LineThreshold` | `1000` | Functions over 1000 lines are flagged | | ||
| | `readability-braces-around-statements.ShortStatementLines` | `0` | Braces required for all control-flow bodies, even single-line | |
There was a problem hiding this comment.
This is good and useful! Do you think it'd be better if we could put it in .clang-format or .clang-tidy and then say "look at those files"?
That way it's easier for these things to stay synced.
| | Type template parameters | `CamelCase` | `NodeType` | | ||
| | Functions, methods | `camelBack` | `sampleNeighbors` | | ||
| | Variables, parameters, members | `camelBack` | `numNodes` | | ||
| | Private/protected members | `camelBack` with `_` prefix | `_nodeFeatures` | |
There was a problem hiding this comment.
fwiw I prefer these to be snake_case (which is also what we do in Python).
Any reason to do otherwise?
There was a problem hiding this comment.
Any reason we have this in csrc instead of where the build script lives?
| def find_cpp_extensions() -> list[Extension]: | ||
| """Auto-discover pybind11 extension modules under ``gigl/csrc/``. | ||
|
|
||
| Following PyTorch's csrc convention, only files named ``python_*.cpp`` are | ||
| compiled as Python extension modules. | ||
|
|
||
| Returns an empty list if ``gigl/csrc/`` does not yet exist. | ||
| """ | ||
| if not _CSRC_DIR.exists(): | ||
| return [] | ||
| extensions = [] | ||
| for cpp_file in sorted(_CSRC_DIR.rglob("python_*.cpp")): | ||
| parts = list(cpp_file.with_suffix("").parts) | ||
| parts[-1] = parts[-1].removeprefix("python_") | ||
| module_name = ".".join(parts) | ||
| impl_file = cpp_file.parent / (parts[-1] + ".cpp") | ||
| sources = [str(cpp_file)] | ||
| if impl_file.exists(): | ||
| sources.append(str(impl_file)) | ||
| extensions.append( | ||
| CppExtension( | ||
| name=module_name, | ||
| sources=sources, | ||
| extra_compile_args=COMPILE_ARGS, | ||
| ) | ||
| ) | ||
| return extensions |
There was a problem hiding this comment.
any reason to do this vs just hard code? Might be nice to be intentional but I can see the appeal of auto-detecting.
| # Suppress PyTorch's CUDA-not-found warning emitted at import time. | ||
| with warnings.catch_warnings(): | ||
| warnings.filterwarnings("ignore") | ||
| from torch.utils.cpp_extension import include_paths as torch_include_paths |
There was a problem hiding this comment.
any reason we do this?
|
|
||
| format: format_py format_scala format_md | ||
| format_cpp: | ||
| $(if $(CPP_SOURCES), clang-format -i --style=file $(CPP_SOURCES)) |
There was a problem hiding this comment.
Should we just always run this?
| uv run python -m scripts.generate_compile_commands | ||
|
|
||
| check_lint_cpp: | ||
| $(if $(CPP_SOURCES), uv run python -m scripts.run_cpp_lint $(CPP_SOURCES)) |
There was a problem hiding this comment.
ditto on always running.
| # changes expressions, adds/removes keywords), not just style. Run manually and | ||
| # review the diff before committing. | ||
| fix_lint_cpp: generate_compile_commands | ||
| $(if $(CPP_SOURCES), clang-tidy --fix -p .cache/compile_commands.json $(CPP_SOURCES)) |
There was a problem hiding this comment.
ditto on always running.
There was a problem hiding this comment.
Do we want to create a new release after this goes in?
| generate_compile_commands: | ||
| uv run python -m scripts.generate_compile_commands | ||
|
|
||
| check_lint_cpp: |
There was a problem hiding this comment.
Should we add a note about this also not being a part of check_format or what not.
Scope of work done
C++ Infrastructure for GiGL
This PR adds the foundational C++ infrastructure to GiGL — everything needed to write, build, lint, and test C++ code alongside the existing Python codebase.
What's included
scripts/build_cpp_extensions.pyauto-discovers and builds pybind11 C++ extensions undergigl/csrc/. The C++ build is wired intomake unit_test_pyandmake install_dev_depsso it happens automatically without a separate step..clang-formatand.clang-tidyconfigs establish C++ style conventions (naming, formatting, static analysis).make format_cppformats in-place.make check_lint_cppruns fast static analysis usingclangd --check(roughly 10× faster thanclang-tidydue to preamble caching).make fix_lint_cppapplies auto-fixable violations viaclang-tidy --fixand is intentionally separate frommake formatsince it rewrites logic, not just style.check_lint_cppis part oflint_test; it is not part oftype_check(which is Python/mypy only).requirements/install_cpp_deps.shinstalls clang-format, clang-tidy, clangd, and cmake on both Mac and Linux, including PATH setup on Mac where Homebrew doesn't override Apple's built-in clang.scripts/generate_compile_commands.pygenerates.cache/compile_commands.jsonso clangd can resolve include paths correctly. Callable directly viamake generate_compile_commands(for IDE/clangd setup after adding new source files) and imported automatically byrun_cpp_lint.pybefore each lint run.scripts/run_cpp_lint.pywrapsclangd --check, runs checks in parallel, and produces clean output: a green pass message or a per-file list of failures. All clangd noise (preamble building, tweak failures, version info) is filtered.docs/cpp_style_guide.mddocuments the formatting and linting rules, how to run them, and rationale for disabled checks (misc-confusable-identifierswas consuming 70% of lint time;readability-function-cognitive-complexitypenalises inherently nested algorithmic code).tests/unit/cpp/with a CMakeLists and a smoke test to verify the infrastructure works end to end.Where is the documentation for this feature?: N/A
Did you add automated tests or write a test plan?
Updated Changelog.md? NO
Ready for code review?: NO