Skip to content

Training for hit cluster classification FlagBkgHits#4

Open
NamithaChitrazee wants to merge 4 commits intoMu2e:mainfrom
NamithaChitrazee:main
Open

Training for hit cluster classification FlagBkgHits#4
NamithaChitrazee wants to merge 4 commits intoMu2e:mainfrom
NamithaChitrazee:main

Conversation

@NamithaChitrazee
Copy link
Copy Markdown

MLP and BDT based training for hit cluster classification.
Input dataset for the training can be created by running ArtAnalysis/fcl/BkgDiag.fcl on a digitized data sample.
CreateInference.C converts the .h5 output of the training to a .hxx file that can be used in Mu2e Offline.

@oksuzian
Copy link
Copy Markdown

Took a look through this PR. A few things worth addressing before merging:

Bugs

  • Input file list doesn't match what was actually trained on. KKTrainBkgFiles.txt contains /TrainBkg/RootFiles/nts.owner.BD.version.sequence.root, but the committed notebook outputs show the real paths are under /Users/namithachithirasree/Documents/LBL_projects/KKTrain/RootFiles/.... Anyone re-running this will fail immediately.
  • Pre-selection stats only reflect the last file processed. The cell that computes kQpresel = kQ[kQ==-1] / kQpass = kQ[kQ>-1] uses the loop-local kQ (the last file, pbar). It should use the concatenated kQ_all.
  • Hardcoded absolute user paths leaked via notebook outputs (/Users/namithachithirasree/...). Worth stripping outputs before commit.

Redundancy

  • output_model.save("model/TrainBkgDiag.h5", ...) and model_ce.save("TrainBkgDiag.h5") save two copies; the cwd one looks unused.
  • fpr_ce, tpr_ce, th_ce = roc_curve(...) and auc_ce = roc_auc_score(...) are computed twice in back-to-back cells.

Code quality

  • Eight for i in range(arr.shape[0]): list.append(arr[i][k]) blocks could each be arr[:, k].
  • Train/test/valid ends up as 50/25/25 (two successive test_size=0.5 splits). Was that intentional? 70/15/15 is more typical.
  • Several unused imports: os, datetime, Sequential, Dropout, Activation, ReLU, SGD, duplicate Adam.
  • output_model (batch=1) duplicates model_ce (batch=bsize) — any architecture change has to be made in both places to keep SOFIE export working.

Missing

  • No TrainBkg/README.md. Both TrkQual/ and the in-flight TrkPID/ PR have one — would be good to document datasets used, the feature list, SOFIE/ROOT version, and rerun instructions.
  • CreateInference.C has an inline "In 6.30…" note but there's no README pointer to required ROOT version.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants