Skip to content

EvidenceOSS/TBI-Benchmarks

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

1 Commit
 
 
 
 
 
 
 
 

Repository files navigation

TBI-Benchmarks -- Evaluation Protocols for TBI Prediction Models

Status: Incubating License: Apache-2.0 Part of: Evidence Commons

"Any new TBI prediction model must demonstrate improvement over IMPACT Core and CRASH Basic -- on calibration, not just discrimination."

Attribute Value
Status Incubating
Maturity Design Phase
License Apache-2.0
Part of Evidence Commons
Mission Pillar Pillar 1 (Clinical AI Evaluation & Benchmarking)

Overview

Traumatic brain injury outcome prediction has two established baselines: IMPACT Core (age, GCS motor, pupils; C-statistic ~0.78-0.82 for mortality) and CRASH Basic. Most published models report only discrimination (AUROC), but calibration superiority -- whether predicted probabilities match observed outcomes -- matters more for clinical deployment. TBI-Benchmarks is designed to provide standardized evaluation protocols, synthetic benchmark datasets, and reference implementations that enforce methodologically rigorous model comparison, including net reclassification improvement (NRI) as the preferred comparison metric.

This repository does not yet contain extracted code or datasets. The parent codebase (evidenceos-research/evidenceos-bench) contains benchmarking infrastructure that is intended to be extracted and adapted for public use here. The planned scope includes synthetic TBI case sets (no real patient data), evaluation harnesses with TRIPOD+AI compliance checking (Collins et al. 2024), and baseline model implementations for IMPACT Core and CRASH Basic.

Architecture

Component Description Parent Code Exists
baselines/ Reference implementations of IMPACT Core and CRASH Basic Planned
datasets/ Synthetic TBI benchmark datasets (fully generated, no real patient data) Not yet
protocols/ Evaluation protocol definitions (discrimination, calibration, NRI, DCA) Partial (in parent)
harness/ Reproducible evaluation runner with TRIPOD+AI compliance checks Partial (in parent)
leaderboard/ Model comparison infrastructure and result formatting Planned

Current State

What exists in the parent codebase:

  • Benchmarking engine in evidenceos-research/evidenceos-bench
  • TRIPOD+AI compliance checklist implementation (Collins et al. 2024)
  • Sample size validation using Riley et al. (2019/2020) criteria: EPV >= 10 is necessary but not sufficient; formal pmsampsize calculation required
  • Multiverse analysis infrastructure producing 927 model configurations across 13 configs

What does not exist yet:

  • Standalone synthetic TBI benchmark datasets for public distribution
  • Reference implementations of IMPACT Core and CRASH Basic as comparison baselines
  • Standardized evaluation harness separated from the research pipeline
  • Public leaderboard infrastructure
  • NRI calculation utilities as a standalone module

Extraction Plan

  1. Define evaluation protocol specifications covering discrimination (AUROC, C-statistic), calibration (calibration-in-the-large, calibration slope, calibration plots), NRI, and decision curve analysis (DCA)
  2. Generate synthetic TBI benchmark datasets using the existing SyntheticDatasetFactory (no real patient data)
  3. Implement reference baselines for IMPACT Core and CRASH Basic with documented expected performance ranges
  4. Extract and adapt the TRIPOD+AI compliance checker as a standalone validation module
  5. Build evaluation harness that accepts any model's predictions and produces standardized comparison reports

Ecosystem Context

graph LR
    A[evidenceos-bench<br/>evaluation engine] --> B[TBI-Benchmarks]
    B --> C[Clinical Arena<br/>model leaderboard]
    B --> D[BRIDGE-TBI<br/>validation baseline]
    style B fill:#2A9D8F,stroke:#1E3A8A,color:#fff
Loading

TBI-Benchmarks is intended to provide the evaluation layer for the Clinical Arena (model leaderboard) and validation infrastructure for BRIDGE-TBI (clinical decision support). Canonical source: evidenceos-research/evidenceos-bench.

Contributing

This project is in the design phase. The evaluation protocols and benchmark specifications are being defined; no code has been extracted to this repository yet. Contributions to protocol specification and synthetic dataset design are the most immediately useful. See CONTRIBUTING.md.

License

Apache-2.0 -- see LICENSE for details.

About

TBI prediction model benchmarks — reproducible comparisons against IMPACT Core and CRASH Basic baselines

Topics

Resources

License

Contributing

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors