Skip to main content

Cross-Validation

SpikeSEG includes a k-fold cross-validation script for robust performance estimation.

Quick Start

python scripts/train_cv.py \
--config configs/config.yaml \
--data-root /path/to/EBSSA \
--n-folds 10 \
--output-dir runs/cv

How It Works

  1. Hold out 10% of recordings as a test set (never touched during training).
  2. Split the remaining 90% into kk folds.
  3. For each fold: train on k1k - 1 folds, validate on the held-out fold.
  4. After all folds: evaluate each fold's model on the test set.
  5. Report mean ±\pm std for validation and test metrics.
flowchart LR
DATA[All Recordings] --> SPLIT
SPLIT -->|10%| TEST[Test Set]
SPLIT -->|90%| FOLDS[k Folds]
FOLDS --> F1[Fold 1: Train k-1, Val 1]
FOLDS --> F2[Fold 2: Train k-1, Val 1]
FOLDS --> FK[Fold k: Train k-1, Val 1]
F1 --> EVAL[Evaluate on Test]
F2 --> EVAL
FK --> EVAL
EVAL --> REPORT["Mean +/- Std Informedness"]

CLI Arguments

FlagDefaultDescription
--config, -cconfigs/config.yamlConfig file
--data-root, -dfrom configEBSSA data root
--n-folds, -k10Number of folds
--test-ratio0.1Fraction held out for test
--seed42Random seed for splits
--output-dir, -oruns/cvOutput directory
--threshold0.05Inference threshold
--eval-onlyfalseOnly evaluate existing folds

Outputs

PathContent
fold_info.jsonFold configuration and recording assignments
test_recordings.txtTest set recording paths
fold_{i:02d}/Per-fold training outputs (checkpoints, logs)
cv_results.jsonSummary: val/test informedness mean, std, min, max