Evaluation Guide

Overview

SpikeYoloV8-Tracker provides comprehensive evaluation tools to assess model performance on detection and tracking tasks.

Evaluation Script

Main Evaluation Script

Location: scripts/evaluation/targeted_model_evaluation.py

Run evaluation on a trained model:

python scripts/evaluation/targeted_model_evaluation.py \
    --checkpoint_path /path/to/checkpoint.pt

Command-Line Arguments

--checkpoint_path: Path to model checkpoint file (required)
--device: Device to use for evaluation (default: "cuda")
--batch_size: Batch size for evaluation (default: from config)
--output_dir: Directory to save evaluation results (optional)

Evaluation Metrics

Detection Metrics

The evaluation script computes standard object detection metrics:

mAP (mean Average Precision): Overall detection accuracy
mAP@0.5: mAP at IoU threshold 0.5
mAP@0.5:0.95: mAP averaged over IoU thresholds 0.5 to 0.95
Per-Class AP: Average precision for each class
Precision: Ratio of true positives to all detections
Recall: Ratio of true positives to all ground truth objects

Tracking Metrics

For tracking evaluation:

MOTA (Multiple Object Tracking Accuracy): Overall tracking accuracy
MOTP (Multiple Object Tracking Precision): Tracking precision
ID Switches: Number of identity switches
Track Fragmentation: Number of track fragments

Evaluation Process

1. Model Loading

The evaluation script loads the trained model from checkpoint:

checkpoint = torch.load(checkpoint_path)
model.load_state_dict(checkpoint['model_state_dict'])
model.eval()

2. Data Loading

Evaluation uses the same data loading pipeline as training:

Loads test/validation sequences
Processes events in temporal windows
Matches annotations temporally

3. Inference

Model runs in evaluation mode:

No gradient computation
Batch processing
Temporal-aware predictions

4. Metric Computation

Metrics computed on predictions vs. ground truth:

Bounding box IoU calculation
Class prediction accuracy
Tracking association accuracy

Logging

Evaluation logs are saved to:

{model.logs_dir}/evaluation_{checkpoint_name}.log

Logs include:

Evaluation progress
Per-sequence metrics
Overall metrics
Detailed statistics

Example Evaluation Output

INFO: Loading checkpoint: /path/to/checkpoint.pt
INFO: Evaluating on test set...
INFO: Processing sequence: test_day_0001_td.h5
INFO: Sequence test_day_0001_td.h5:
      mAP@0.5: 0.45
      mAP@0.5:0.95: 0.32
      Precision: 0.67
      Recall: 0.58
INFO: Overall Results:
      mAP@0.5: 0.42
      mAP@0.5:0.95: 0.30
      Precision: 0.65
      Recall: 0.55

Best Practices

1. Use Validation Set During Training

Monitor validation metrics during training to avoid overfitting.

2. Evaluate on Test Set

Only evaluate on test set after final model selection.

3. Per-Class Analysis

Analyze per-class metrics to identify weak classes:

# Check per-class AP
for class_name, ap in per_class_ap.items():
    print(f"{class_name}: {ap:.3f}")

4. Temporal Analysis

Evaluate temporal consistency:

Check tracking continuity
Analyze ID switches
Monitor track fragmentation

Troubleshooting

Common Issues

Low mAP: Check data quality, verify annotations
High False Positives: Adjust confidence threshold
Tracking Failures: Check tracking features, adjust ByteTracker parameters
Memory Issues: Reduce batch size or sequence length

Overview​

Evaluation Script​

Main Evaluation Script​

Command-Line Arguments​

Evaluation Metrics​

Detection Metrics​

Tracking Metrics​

Evaluation Process​

1. Model Loading​

2. Data Loading​

3. Inference​

4. Metric Computation​

Logging​

Example Evaluation Output​

Best Practices​

1. Use Validation Set During Training​

2. Evaluate on Test Set​

3. Per-Class Analysis​

4. Temporal Analysis​

Troubleshooting​

Common Issues​

Next Steps​