Architecture
Overview
SpikeYoloV8-Tracker is built on the BICLab SpikeYOLO (ECCV 2024) architecture, which uses spiking neural networks (SNNs) for energy-efficient object detection on event camera data.
BICLab SpikeYOLO (ECCV 2024)
Neuron Type
- I-LIF (Integer-valued LIF): Spiking neurons that operate on integer values
- Training: Integer-valued training with spike-driven inference
- Architecture: Simplified YOLOv8 with meta SNN blocks
Key Components
1. MS_DownSampling
Spiking downsampling layers that reduce spatial dimensions while preserving temporal information.
2. MS_ConvBlock
Spiking convolution blocks that process event-based features through time.
3. SpikeSPPF
Spiking spatial pyramid pooling for multi-scale feature extraction.
4. SpikeDetect
Spiking detection head that outputs bounding boxes, class predictions, and tracking features.
Project Structure
Object_Detection&Tracking/
├── ultralytics/ # Modified BICLab SpikeYOLO implementation
│ └── nn/
│ └── modules
│ ├── yolo_spikformer.py # Training Layers (With Tracking)(uses multispike)
│ └── yolo_spikformer_bin.py # Inference Layers (With Tracking)(uses D substep binary spikes)
├── config/ # Configuration files
│ └── config.yaml # Main configuration file
├── src/ # Core source code
│ ├── __init__.py
│ ├── config_loader.py # Configuration management
│ ├── data_loader.py # Data loading and preprocessing
│ ├── logging_utils.py # Unified logging setup
│ └── etram_spikeyolo_tracking.py # High-level model architecture
├── scripts/ # Executable scripts
│ ├── training/ # Training scripts
│ │ ├── __init__.py
│ │ ├── comprehensive_training.py # Main training script
│ │ └── hyperparameter_search.py # Hyperparameter search
│ ├── evaluation/ # Evaluation scripts
│ │ ├── __init__.py
│ │ └── targeted_model_evaluation.py # Targeted model evaluation
│ └── utils/ # Utility scripts
│ ├── __init__.py
│ └── calculate_class_weights.py # Class weight calculation
├── HDF5/ # Event data files
├── class annotations/ # Training annotations for classes
├── yolo_loss.py # Loss functions used for training
└── requirements.txt # Dependencies
Data Flow
Event Processing Pipeline
- Event Loading: Read events from HDF5 files
- Temporal Windowing: Convert continuous events to discrete time windows
- Spike Encoding: Convert events to spike trains
- SNN Processing: Process spikes through spiking neural network layers
- Detection: Generate bounding boxes and class predictions
- Tracking: Associate detections across time using ByteTracker
Temporal Processing
The architecture preserves temporal information throughout the network:
- Input:
[T, B, C, H, W]- Temporal sequence of spike frames - Processing: Each temporal step processed independently
- Output:
[T, B, H*W, features]- Temporal predictions preserved - Loss: Computed separately for each temporal step
Tracking Integration
The model outputs dual predictions:
- Detection Features: Bounding boxes, classes, confidence scores
- Tracking Features: Feature embeddings for object association
ByteTracker uses these features to:
- Associate detections across frames
- Handle occlusions and re-identifications
- Maintain consistent track IDs
Memory Efficiency
The architecture is designed for low memory usage:
- Streaming Processing: Events loaded on-demand
- Dynamic Batching: Adapts to available memory
- Integer Operations: Reduced precision for efficiency
- Temporal-Aware Processing: No unnecessary temporal aggregation
Research Reference
BICLab SpikeYOLO (ECCV 2024)
- Paper: "Integer-Valued Training and Spike-Driven Inference Spiking Neural Network for High-performance and Energy-efficient Object Detection"
- Authors: Xinhao Luo, Man Yao, Yuhong Chou, Bo Xu, Guoqi Li
- Institution: BICLab, Institute of Automation, Chinese Academy of Sciences