Architecture

This document describes the architecture and general flow of the Spectrum Analyzer system.

Repository: GitHub

System Overview

The Spectrum Analyzer uses a modified YOLO (You Only Look Once) architecture based on the EMS-YOLO framework. The system processes spectrogram images to detect and classify various signal types.

General Flow

Input Spectrogram Image (512×512)
    ↓
Preprocessing & Normalization
    ↓
Time Window Processing (5 frames)
    ↓
ResNet Backbone (Feature Extraction)
    ↓
Multi-scale Feature Maps
    ↓
YOLO Detection Head
    ↓
Bounding Box Predictions + Class Scores
    ↓
Non-Maximum Suppression (NMS)
    ↓
Output: Detected Signals with Bounding Boxes

Architecture Components

1. Input Processing

Image Input:

Default resolution: 512×512 pixels
Channels: 3 (RGB spectrogram representation)
Configurable via --imgsz parameter

Time Window Integration:

Processes 5-frame temporal windows
Each frame is duplicated to create temporal context
Helps capture temporal signal characteristics

2. Backbone Network

The backbone extracts features from input images. Supported architectures:

ResNet-18 (Default):

BasicBlock layers for feature extraction
Multi-scale feature maps at different resolutions
Configurable via model YAML files

Other Options:

ResNet-10: Lighter model
ResNet-34: Deeper model
Custom architectures via YAML configuration

Model architecture details are available in the training output directory.

3. Detection Head

YOLO Detection Layers:

Two detection scales:
- P4/16: Medium-scale detection (stride 16)
- P5/32: Large-scale detection (stride 32)
Each layer outputs bounding boxes, confidence, and class predictions

Anchor Boxes:

Predefined anchor boxes optimized for signal detection
P4/16 anchors: [10,14, 23,27, 37,58]
P5/32 anchors: [81,82, 135,169, 344,319]
Configurable in model YAML files

4. Output Processing

Detection Output:

Bounding boxes (x, y, width, height)
Confidence scores (0.0-1.0)
Class predictions (11 classes for RadDet)

Post-processing:

Non-Maximum Suppression (NMS) to remove duplicate detections
Confidence threshold filtering
Class-specific filtering (optional)

Model Configuration

Models are configured via YAML files in the models/ directory:

# Example: models/resnet18.yaml
nc: 11  # number of classes
backbone:
  [[-1, 1, Conv_1, [64, 7, 2]],
   [-1, 1, BasicBlock_2, [64, 3, 2]],
   # ... more layers
  ]
head:
  [[-1, 1, Detect, [nc, anchors]],]

Time Window Processing

The system uses a 5-frame time window for temporal context:

Input image is duplicated 5 times
Each frame is processed through the backbone
Features are aggregated across time dimension
Final detection uses temporal information

This helps capture:

Signal continuity over time
Temporal patterns in spectrograms
Better detection of time-varying signals

Feature Extraction Flow

Input Image (1, 3, 512, 512)
    ↓
Time Window Expansion (5, 3, 512, 512)
    ↓
ResNet Backbone
    ├─→ Conv + BasicBlock layers
    ├─→ Multi-scale features
    └─→ Feature maps at different resolutions
    ↓
Detection Layers
    ├─→ P4/16 features → Medium detections
    └─→ P5/32 features → Large detections
    ↓
Temporal Aggregation (sum over time dimension)
    ↓
Final Predictions

Detection Process

Feature Extraction: Backbone extracts multi-scale features
Detection Prediction: YOLO head predicts boxes, confidence, classes
Temporal Aggregation: Features aggregated across time window
NMS: Non-maximum suppression removes duplicates
Filtering: Confidence and class filtering
Output: Final detections with bounding boxes

Performance Characteristics

Inference Speed:

Depends on image size and model architecture
ResNet-18: ~XX ms per image (512×512)
Configurable via batch size and device selection

Memory Usage:

Model size: ~XX MB (ResNet-18)
GPU memory: Depends on batch size
Configurable via quantization

Customization

The architecture can be customized:

Model Backbone: Change in model YAML file
Image Size: Configure via --imgsz parameter
Number of Classes: Set in dataset YAML file
Anchors: Modify in model YAML file
Time Window: Adjust in models/yolo.py (time_window variable)

For detailed configuration options, see Configuration Guide.

Quick Start: Get started with the system
Configuration: All configuration options
Training: How to train models
Detection: Running inference
Model Conversion: Converting models for deployment
RadDet Use Case: Example use case demonstration

System Overview​

General Flow​

Architecture Components​

1. Input Processing​

2. Backbone Network​

3. Detection Head​

4. Output Processing​

Model Configuration​

Time Window Processing​

Feature Extraction Flow​

Detection Process​

Performance Characteristics​

Customization​

Related Documentation​