Architecture
This document describes the architecture and general flow of the Spectrum Analyzer system.
Repository: GitHub
System Overview
The Spectrum Analyzer uses a modified YOLO (You Only Look Once) architecture based on the EMS-YOLO framework. The system processes spectrogram images to detect and classify various signal types.
General Flow
Input Spectrogram Image (512×512)
↓
Preprocessing & Normalization
↓
Time Window Processing (5 frames)
↓
ResNet Backbone (Feature Extraction)
↓
Multi-scale Feature Maps
↓
YOLO Detection Head
↓
Bounding Box Predictions + Class Scores
↓
Non-Maximum Suppression (NMS)
↓
Output: Detected Signals with Bounding Boxes
Architecture Components
1. Input Processing
Image Input:
- Default resolution: 512×512 pixels
- Channels: 3 (RGB spectrogram representation)
- Configurable via
--imgszparameter
Time Window Integration:
- Processes 5-frame temporal windows
- Each frame is duplicated to create temporal context
- Helps capture temporal signal characteristics
2. Backbone Network
The backbone extracts features from input images. Supported architectures:
ResNet-18 (Default):
- BasicBlock layers for feature extraction
- Multi-scale feature maps at different resolutions
- Configurable via model YAML files
Other Options:
- ResNet-10: Lighter model
- ResNet-34: Deeper model
- Custom architectures via YAML configuration
Model architecture details are available in the training output directory.
3. Detection Head
YOLO Detection Layers:
- Two detection scales:
- P4/16: Medium-scale detection (stride 16)
- P5/32: Large-scale detection (stride 32)
- Each layer outputs bounding boxes, confidence, and class predictions
Anchor Boxes:
- Predefined anchor boxes optimized for signal detection
- P4/16 anchors: [10,14, 23,27, 37,58]
- P5/32 anchors: [81,82, 135,169, 344,319]
- Configurable in model YAML files
4. Output Processing
Detection Output:
- Bounding boxes (x, y, width, height)
- Confidence scores (0.0-1.0)
- Class predictions (11 classes for RadDet)
Post-processing:
- Non-Maximum Suppression (NMS) to remove duplicate detections
- Confidence threshold filtering
- Class-specific filtering (optional)
Model Configuration
Models are configured via YAML files in the models/ directory:
# Example: models/resnet18.yaml
nc: 11 # number of classes
backbone:
[[-1, 1, Conv_1, [64, 7, 2]],
[-1, 1, BasicBlock_2, [64, 3, 2]],
# ... more layers
]
head:
[[-1, 1, Detect, [nc, anchors]],]
Time Window Processing
The system uses a 5-frame time window for temporal context:
- Input image is duplicated 5 times
- Each frame is processed through the backbone
- Features are aggregated across time dimension
- Final detection uses temporal information
This helps capture:
- Signal continuity over time
- Temporal patterns in spectrograms
- Better detection of time-varying signals
Feature Extraction Flow
Input Image (1, 3, 512, 512)
↓
Time Window Expansion (5, 3, 512, 512)
↓
ResNet Backbone
├─→ Conv + BasicBlock layers
├─→ Multi-scale features
└─→ Feature maps at different resolutions
↓
Detection Layers
├─→ P4/16 features → Medium detections
└─→ P5/32 features → Large detections
↓
Temporal Aggregation (sum over time dimension)
↓
Final Predictions
Detection Process
- Feature Extraction: Backbone extracts multi-scale features
- Detection Prediction: YOLO head predicts boxes, confidence, classes
- Temporal Aggregation: Features aggregated across time window
- NMS: Non-maximum suppression removes duplicates
- Filtering: Confidence and class filtering
- Output: Final detections with bounding boxes
Performance Characteristics
Inference Speed:
- Depends on image size and model architecture
- ResNet-18: ~XX ms per image (512×512)
- Configurable via batch size and device selection
Memory Usage:
- Model size: ~XX MB (ResNet-18)
- GPU memory: Depends on batch size
- Configurable via quantization
Customization
The architecture can be customized:
- Model Backbone: Change in model YAML file
- Image Size: Configure via
--imgszparameter - Number of Classes: Set in dataset YAML file
- Anchors: Modify in model YAML file
- Time Window: Adjust in
models/yolo.py(time_window variable)
For detailed configuration options, see Configuration Guide.
Related Documentation
- Quick Start: Get started with the system
- Configuration: All configuration options
- Training: How to train models
- Detection: Running inference
- Model Conversion: Converting models for deployment
- RadDet Use Case: Example use case demonstration