Dataset Format

eTraM Dataset Structure

The project is designed to work with the eTraM (Event-based Traffic Monitoring) dataset format.

eTraM/
├── HDF5/
│   ├── train_h5_6/
│   │   ├── train_night_0040_td.h5      # Event data
│   │   ├── train_night_0040_bbox.npy   # Grouped annotations (3 classes)
│   │   └── ...
│   ├── val_h5_1/
│   └── test_h5_1/
└── class annotations/
    ├── eight_class_annotations_train/
    │   ├── train_night_0040_bbox.npy   # Fine-grained annotations (8 classes)
    │   └── ...
    ├── eight_class_annotations_val/
    └── eight_class_annotations_test/

Data Splits

Train: ~112 sequences (mix of day/night)
Val: ~23 sequences (mix of day/night)
Test: ~30 sequences (mix of day/night)

Event Data Format (.h5 files)

HDF5 Structure

Event data is stored in HDF5 format with the following structure:

events/
├── x: uint16 array        # X coordinates (0-1279)
├── y: uint16 array        # Y coordinates (0-719)
├── p: int16 array         # Polarity (0=negative, 1=positive)
├── t: int64 array         # Timestamps in microseconds
├── width: int64 scalar    # Image width (1280)
└── height: int64 scalar   # Image height (720)

Event Data Characteristics

Resolution: 1280×720 pixels
Event Count: ~17M events per sequence (5-6 seconds)
Temporal Resolution: Microsecond precision
Polarity Distribution: ~50/50 positive/negative events
Duration: 5-6 seconds per sequence

Example: Loading Event Data

import h5py
import numpy as np

with h5py.File('train_night_0040_td.h5', 'r') as f:
    events = {
        'x': f['events/x'][:],
        'y': f['events/y'][:],
        'p': f['events/p'][:],
        't': f['events/t'][:],
        'width': f['events/width'][()],
        'height': f['events/height'][()]
    }

Annotation Format (.npy files)

Structured Array Fields

Annotations are stored as NumPy structured arrays with the following dtype:

dtype = [
    ('t', '<i8'),           # Timestamp (int64)
    ('x', '<f4'),           # Top-left X coordinate (float32)
    ('y', '<f4'),           # Top-left Y coordinate (float32)
    ('w', '<f4'),           # Bounding box width (float32)
    ('h', '<f4'),           # Bounding box height (float32)
    ('class_id', '<u4'),    # Class identifier (uint32)
    ('track_id', '<u4'),    # Object tracking ID (uint32)
    ('class_confidence', '<f4')  # Detection confidence (float32)
]

Bounding Box Format

Format: (x, y, w, h) - top-left corner + width/height
Coordinates: Pixel coordinates in the image space
Temporal Alignment: Annotations synchronized with event timestamps

Example: Loading Annotations

import numpy as np

annotations = np.load('train_night_0040_bbox.npy')

# Access fields
timestamps = annotations['t']
bboxes = np.column_stack([
    annotations['x'],
    annotations['y'],
    annotations['w'],
    annotations['h']
])
class_ids = annotations['class_id']
track_ids = annotations['track_id']

Dynamic Class Configuration

The project supports dynamic class configuration through config.yaml. Classes are defined in the classes list, and the number of classes is automatically detected.

Example Configuration

classes:
  - Pedestrian
  - Car
  - Bicycle
  - Bus
  - Motorbike
  - Truck
  - Tram
  - Wheelchair

Previous Class Mappings (for reference)

Fine-grained 8-Class: Pedestrian, Car, Bicycle, Bus, Motorbike, Truck, Tram, Wheelchair
Grouped 3-Class: Pedestrian, Vehicle (Car, Bus, Truck, Tram), Micro-mobility (Bicycle, Motorbike, Wheelchair)

Annotation Characteristics

Temporal Alignment: Annotations synchronized with event timestamps
Tracking: Each object has a unique track_id across frames
Confidence: Detection confidence scores
Density: ~200-300 annotations per sequence

Sample Sequence Analysis

Example statistics from train_night_0040:

Events: 17,428,542 events over 5.72 seconds
Annotations: 212 annotations across 78 unique timestamps
Classes: Pedestrian (134), Car (78)
Bbox Sizes: w=294.9±148.9, h=161.7±40.7 pixels
Event Rate: ~3M events/second

Data Loading Implementation

Key Components

UltraLowMemoryLoader: Main data loading class
EventProcessor: Event-to-frame conversion utilities
eTraMDataset: PyTorch Dataset wrapper
Streaming Support: Real-time event processing

Dynamic Batching Process

The eTraM data loader uses a sophisticated dynamic batching system:

1. Dynamic Sample Calculation

Instead of fixed samples per file, the system calculates the exact number of samples needed:

train_day_0001.h5: 1,500,000 events → 150 samples (10K events each)
train_night_0040.h5: 17,428,542 events → 1,743 samples (10K events each)

2. Overlapping Window Sampling

Each sample uses overlapping windows to ensure complete coverage:

# Use overlapping windows to ensure full coverage
overlap = self.max_events_per_sample // 4  # 25% overlap
step_size = self.max_events_per_sample - overlap  # 7,500 events step

start_idx = sample_idx * step_size
end_idx = min(start_idx + self.max_events_per_sample, total_events)

3. Temporal Annotation Matching

Annotations are temporally matched to the specific events being processed:

def _load_annotations_for_events(self, h5_file_path, events):
    """Load annotations that match the specific events temporally"""
    
    # Extract timestamps from loaded events
    event_timestamps = events[:, 2]  # Events are [x, y, t, p]
    start_time = float(event_timestamps.min())
    end_time = float(event_timestamps.max())
    
    # Add buffer for timing variations
    time_buffer = (end_time - start_time) * 0.2  # 20% buffer
    start_time -= time_buffer
    end_time += time_buffer
    
    # Filter annotations by time window
    all_annotations = self._load_all_annotations(h5_file_path)
    time_mask = (all_annotations['t'] >= start_time) & (all_annotations['t'] <= end_time)
    filtered_annotations = all_annotations[time_mask]
    
    return filtered_annotations

Memory Requirements

Event Data: ~200MB per sequence (HDF5 compressed)
Annotations: ~50KB per sequence
Processing: ~500MB RAM for real-time processing

Camera Specifications

Model: Prophesee EVK4 HD
Sensor: Sony IMX636 Event-Based Vision Sensor
Resolution: 1280×720 pixels
Dynamic Range: >86 dB
Temporal Resolution: >10,000 fps

eTraM Dataset Structure​

Data Splits​

Event Data Format (.h5 files)​

HDF5 Structure​

Event Data Characteristics​

Example: Loading Event Data​

Annotation Format (.npy files)​

Structured Array Fields​

Bounding Box Format​

Example: Loading Annotations​

Dynamic Class Configuration​

Example Configuration​

Previous Class Mappings (for reference)​

Annotation Characteristics​

Sample Sequence Analysis​

Data Loading Implementation​

Key Components​

Dynamic Batching Process​

1. Dynamic Sample Calculation​

2. Overlapping Window Sampling​

3. Temporal Annotation Matching​

Memory Requirements​

Camera Specifications​