Training for hit cluster classification FlagBkgHits by NamithaChitrazee · Pull Request #4 · Mu2e/MLTrain

NamithaChitrazee · 2026-03-13T22:13:19Z

MLP and BDT based training for hit cluster classification.
Input dataset for the training can be created by running ArtAnalysis/fcl/BkgDiag.fcl on a digitized data sample.
CreateInference.C converts the .h5 output of the training to a .hxx file that can be used in Mu2e Offline.

oksuzian · 2026-04-21T13:27:09Z

Took a look through this PR. A few things worth addressing before merging:

Bugs

Input file list doesn't match what was actually trained on. KKTrainBkgFiles.txt contains /TrainBkg/RootFiles/nts.owner.BD.version.sequence.root, but the committed notebook outputs show the real paths are under /Users/namithachithirasree/Documents/LBL_projects/KKTrain/RootFiles/.... Anyone re-running this will fail immediately.
Pre-selection stats only reflect the last file processed. The cell that computes kQpresel = kQ[kQ==-1] / kQpass = kQ[kQ>-1] uses the loop-local kQ (the last file, pbar). It should use the concatenated kQ_all.
Hardcoded absolute user paths leaked via notebook outputs (/Users/namithachithirasree/...). Worth stripping outputs before commit.

Redundancy

output_model.save("model/TrainBkgDiag.h5", ...) and model_ce.save("TrainBkgDiag.h5") save two copies; the cwd one looks unused.
fpr_ce, tpr_ce, th_ce = roc_curve(...) and auc_ce = roc_auc_score(...) are computed twice in back-to-back cells.

Code quality

Eight for i in range(arr.shape[0]): list.append(arr[i][k]) blocks could each be arr[:, k].
Train/test/valid ends up as 50/25/25 (two successive test_size=0.5 splits). Was that intentional? 70/15/15 is more typical.
Several unused imports: os, datetime, Sequential, Dropout, Activation, ReLU, SGD, duplicate Adam.
output_model (batch=1) duplicates model_ce (batch=bsize) — any architecture change has to be made in both places to keep SOFIE export working.

Missing

No TrainBkg/README.md. Both TrkQual/ and the in-flight TrkPID/ PR have one — would be good to document datasets used, the feature list, SOFIE/ROOT version, and rerun instructions.
CreateInference.C has an inline "In 6.30…" note but there's no README pointer to required ROOT version.

NamithaChitrazee added 4 commits March 13, 2026 16:56

Create KKTrainBkgFiles.txt

ae09b47

Add files via upload

99349b7

Create TrainBkgDiag.hxx

fb686ab

Add files via upload

19a91cb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Training for hit cluster classification FlagBkgHits#4

Training for hit cluster classification FlagBkgHits#4
NamithaChitrazee wants to merge 4 commits intoMu2e:mainfrom
NamithaChitrazee:main

NamithaChitrazee commented Mar 13, 2026

Uh oh!

oksuzian commented Apr 21, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

NamithaChitrazee commented Mar 13, 2026

Uh oh!

oksuzian commented Apr 21, 2026

Bugs

Redundancy

Code quality

Missing

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants