TBI-Benchmarks -- Evaluation Protocols for TBI Prediction Models

"Any new TBI prediction model must demonstrate improvement over IMPACT Core and CRASH Basic -- on calibration, not just discrimination."

Attribute	Value
Status	Incubating
Maturity	Design Phase
License	Apache-2.0
Part of	Evidence Commons
Mission Pillar	Pillar 1 (Clinical AI Evaluation & Benchmarking)

Overview

Traumatic brain injury outcome prediction has two established baselines: IMPACT Core (age, GCS motor, pupils; C-statistic ~0.78-0.82 for mortality) and CRASH Basic. Most published models report only discrimination (AUROC), but calibration superiority -- whether predicted probabilities match observed outcomes -- matters more for clinical deployment. TBI-Benchmarks is designed to provide standardized evaluation protocols, synthetic benchmark datasets, and reference implementations that enforce methodologically rigorous model comparison, including net reclassification improvement (NRI) as the preferred comparison metric.

This repository does not yet contain extracted code or datasets. The parent codebase (evidenceos-research/evidenceos-bench) contains benchmarking infrastructure that is intended to be extracted and adapted for public use here. The planned scope includes synthetic TBI case sets (no real patient data), evaluation harnesses with TRIPOD+AI compliance checking (Collins et al. 2024), and baseline model implementations for IMPACT Core and CRASH Basic.

Architecture

Component	Description	Parent Code Exists
`baselines/`	Reference implementations of IMPACT Core and CRASH Basic	Planned
`datasets/`	Synthetic TBI benchmark datasets (fully generated, no real patient data)	Not yet
`protocols/`	Evaluation protocol definitions (discrimination, calibration, NRI, DCA)	Partial (in parent)
`harness/`	Reproducible evaluation runner with TRIPOD+AI compliance checks	Partial (in parent)
`leaderboard/`	Model comparison infrastructure and result formatting	Planned

Current State

What exists in the parent codebase:

Benchmarking engine in evidenceos-research/evidenceos-bench
TRIPOD+AI compliance checklist implementation (Collins et al. 2024)
Sample size validation using Riley et al. (2019/2020) criteria: EPV >= 10 is necessary but not sufficient; formal pmsampsize calculation required
Multiverse analysis infrastructure producing 927 model configurations across 13 configs

What does not exist yet:

Standalone synthetic TBI benchmark datasets for public distribution
Reference implementations of IMPACT Core and CRASH Basic as comparison baselines
Standardized evaluation harness separated from the research pipeline
Public leaderboard infrastructure
NRI calculation utilities as a standalone module

Extraction Plan

Define evaluation protocol specifications covering discrimination (AUROC, C-statistic), calibration (calibration-in-the-large, calibration slope, calibration plots), NRI, and decision curve analysis (DCA)
Generate synthetic TBI benchmark datasets using the existing SyntheticDatasetFactory (no real patient data)
Implement reference baselines for IMPACT Core and CRASH Basic with documented expected performance ranges
Extract and adapt the TRIPOD+AI compliance checker as a standalone validation module
Build evaluation harness that accepts any model's predictions and produces standardized comparison reports

Ecosystem Context

graph LR
    A[evidenceos-bench<br/>evaluation engine] --> B[TBI-Benchmarks]
    B --> C[Clinical Arena<br/>model leaderboard]
    B --> D[BRIDGE-TBI<br/>validation baseline]
    style B fill:#2A9D8F,stroke:#1E3A8A,color:#fff

TBI-Benchmarks is intended to provide the evaluation layer for the Clinical Arena (model leaderboard) and validation infrastructure for BRIDGE-TBI (clinical decision support). Canonical source: evidenceos-research/evidenceos-bench.

Contributing

This project is in the design phase. The evaluation protocols and benchmark specifications are being defined; no code has been extracted to this repository yet. Contributions to protocol specification and synthetic dataset design are the most immediately useful. See CONTRIBUTING.md.

License

Apache-2.0 -- see LICENSE for details.

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
CONTRIBUTING.md		CONTRIBUTING.md
LICENSE		LICENSE
README.md		README.md
STATUS.md		STATUS.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

TBI-Benchmarks -- Evaluation Protocols for TBI Prediction Models

Overview

Architecture

Current State

Extraction Plan

Ecosystem Context

Contributing

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Folders and files

Latest commit

History

Repository files navigation

TBI-Benchmarks -- Evaluation Protocols for TBI Prediction Models

Overview

Architecture

Current State

Extraction Plan

Ecosystem Context

Contributing

License

About

Topics

Resources

License

Contributing

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Packages