Tutorial: Ghost Architecture & Detection Pipeline

This tutorial covers the GhostNet-derived spiking primitives and the object detection pipeline in TALON IR. These primitives power efficient architectures like SU-YOLO for neuromorphic object detection.

What You'll Learn

SGhostConv — spiking ghost convolution
GhostBasicBlock1 — stride-2 CSP-ELAN ghost bottleneck (backbone)
GhostBasicBlock2 — no-stride CSP-ELAN ghost bottleneck (FPN head)
SGhostEncoderLite — stem encoder (Layer 0)
ChannelSplit and Concat — channel manipulation
Detection pipeline: SDDetect → DFLDecode → Dist2BBox → NMS

Prerequisites

pip install t1c-talon

1. SGhostConv

A Spiking Ghost Convolution generates features efficiently by combining a primary 1×1 convolution with a cheap depthwise 3×3 convolution. The output channels are 2 × primary_out_channels (primary features concatenated with ghost features).

import numpy as np
from talon import ir

ghost_conv = ir.SGhostConv(
    primary_weight=np.random.randn(16, 8, 1, 1).astype(np.float32) * 0.1,
    primary_bias=np.zeros(16, dtype=np.float32),
    cheap_weight=np.random.randn(16, 1, 3, 3).astype(np.float32) * 0.1,
    cheap_bias=np.zeros(16, dtype=np.float32),
)

print(f"SGhostConv:")
print(f"  Primary weight: {ghost_conv.primary_weight.shape}")
print(f"  Cheap weight: {ghost_conv.cheap_weight.shape}")
print(f"  Output channels: {ghost_conv.primary_weight.shape[0] * 2}")

Output:

SGhostConv:
  Primary weight: (16, 8, 1, 1)
  Cheap weight: (16, 1, 3, 3)
  Output channels: 32

2. GhostBasicBlock1 (Stride-2 Backbone)

The stride-2 block downsamples spatial dimensions while expanding channels. It is the core building block of the SU-YOLO backbone. Internally it uses three SGhostConv operations:

cvres: residual path (stride-2 depthwise downsample)
cv0: main path first ghost conv
cv2: main path second ghost conv (processes split features)

Channel Constraints

cv0 ghost output = 2 * cp0   (primary + cheap)
split: Z1 = cp0 channels, Z2 = cp0 channels
cv2_in  = cp0   (cv2 processes Z2, which is the primary-only half)
cvres_out = cp0 / 2  (= cp_r)
cv2_out   = cp_r
final output = cat(Z1[cp0], cv2_out[2*cp_r]) = cp0 + 2*cp_r channels

in_ch = 32
cp0 = 16    # cv0 primary output
cp_r = 8    # cvres primary = cp0 / 2
cp2 = 8     # cv2 primary = cp_r

gb1 = ir.GhostBasicBlock1(
    cvres_primary_weight=np.random.randn(cp_r, in_ch, 1, 1).astype(np.float32) * 0.1,
    cvres_primary_bias=np.zeros(cp_r, dtype=np.float32),
    cvres_cheap_weight=np.random.randn(cp_r, 1, 3, 3).astype(np.float32) * 0.1,
    cvres_cheap_bias=np.zeros(cp_r, dtype=np.float32),
    cv0_primary_weight=np.random.randn(cp0, in_ch, 1, 1).astype(np.float32) * 0.1,
    cv0_primary_bias=np.zeros(cp0, dtype=np.float32),
    cv0_cheap_weight=np.random.randn(cp0, 1, 3, 3).astype(np.float32) * 0.1,
    cv0_cheap_bias=np.zeros(cp0, dtype=np.float32),
    cv2_primary_weight=np.random.randn(cp2, cp0, 1, 1).astype(np.float32) * 0.1,
    cv2_primary_bias=np.zeros(cp2, dtype=np.float32),
    cv2_cheap_weight=np.random.randn(cp2, 1, 3, 3).astype(np.float32) * 0.1,
    cv2_cheap_bias=np.zeros(cp2, dtype=np.float32),
    stride=2,
)

print(f"GhostBasicBlock1 (stride-2):")
print(f"  Input channels: {in_ch}")
print(f"  cv0 primary out: {cp0} (total: {cp0 * 2})")
print(f"  cvres primary out: {cp_r} (total: {cp_r * 2})")
print(f"  cv2 primary out: {cp2} (total: {cp2 * 2})")
print(f"  stride: {gb1.stride}")

Output:

GhostBasicBlock1 (stride-2):
  Input channels: 32
  cv0 primary out: 16 (total: 32)
  cvres primary out: 8 (total: 16)
  cv2 primary out: 8 (total: 16)
  stride: 2

3. GhostBasicBlock2 (No-Stride FPN Head)

The no-stride block maintains spatial dimensions and is used in the FPN (Feature Pyramid Network) head. It uses two SGhostConv operations (cv0 and cv2) without a residual downsample path.

Channel Constraints

cv0 ghost output = 2 * cp0   (primary + cheap)
split: Z1 = cp0 channels, Z2 = cp0 channels
cv2_in = cp0   (cv2 processes Z2, the primary-only half of cv0's output)
cv2_out = 2 * cp2
final output = cat(Z1[cp0], cv2_out[2*cp2]) = cp0 + 2*cp2 channels

in_ch = 32
cp0 = 16    # cv0 primary output
cp2 = 8     # cv2 primary = cp0 / 2

gb2 = ir.GhostBasicBlock2(
    cv0_primary_weight=np.random.randn(cp0, in_ch, 1, 1).astype(np.float32) * 0.1,
    cv0_primary_bias=np.zeros(cp0, dtype=np.float32),
    cv0_cheap_weight=np.random.randn(cp0, 1, 3, 3).astype(np.float32) * 0.1,
    cv0_cheap_bias=np.zeros(cp0, dtype=np.float32),
    cv2_primary_weight=np.random.randn(cp2, cp0, 1, 1).astype(np.float32) * 0.1,
    cv2_primary_bias=np.zeros(cp2, dtype=np.float32),
    cv2_cheap_weight=np.random.randn(cp2, 1, 3, 3).astype(np.float32) * 0.1,
    cv2_cheap_bias=np.zeros(cp2, dtype=np.float32),
)

print(f"GhostBasicBlock2 (no stride):")
print(f"  Input channels: {in_ch}")
print(f"  cv0 primary out: {cp0} (total: {cp0 * 2})")
print(f"  cv2 primary out: {cp2} (total: {cp2 * 2})")

Output:

GhostBasicBlock2 (no stride):
  Input channels: 32
  cv0 primary out: 16 (total: 32)
  cv2 primary out: 8 (total: 16)

4. SGhostEncoderLite

The stem encoder (Layer 0) combines a standard 3×3 convolution with a ghost convolution. It is the entry point of the SU-YOLO backbone, processing the raw input into initial feature maps with stride-2 spatial reduction.

encoder = ir.SGhostEncoderLite(
    conv1_weight=np.random.randn(16, 3, 3, 3).astype(np.float32) * 0.1,
    conv1_bias=np.zeros(16, dtype=np.float32),
    ghost_primary_weight=np.random.randn(16, 16, 1, 1).astype(np.float32) * 0.1,
    ghost_primary_bias=np.zeros(16, dtype=np.float32),
    ghost_cheap_weight=np.random.randn(16, 1, 3, 3).astype(np.float32) * 0.1,
    ghost_cheap_bias=np.zeros(16, dtype=np.float32),
    stride=2,
)

print(f"SGhostEncoderLite:")
print(f"  conv1 weight: {encoder.conv1_weight.shape}")
print(f"  ghost primary: {encoder.ghost_primary_weight.shape}")
print(f"  ghost cheap: {encoder.ghost_cheap_weight.shape}")
print(f"  stride: {encoder.stride}")

Output:

SGhostEncoderLite:
  conv1 weight: (16, 3, 3, 3)
  ghost primary: (16, 16, 1, 1)
  ghost cheap: (16, 1, 3, 3)
  stride: 2

5. ChannelSplit and Concat

These primitives handle channel manipulation within the CSP-ELAN structure.

split = ir.ChannelSplit(split_sections=(16, 16), dim=0)
print(f"ChannelSplit: sections={split.split_sections}, dim={split.dim}")

concat = ir.Concat(num_inputs=2, dim=0)
print(f"Concat: num_inputs={concat.num_inputs}, dim={concat.dim}")

Output:

ChannelSplit: sections=(16, 16), dim=0
Concat: num_inputs=2, dim=0

6. Detection Pipeline

The detection head processes feature maps through four stages:

SDDetect → DFLDecode → Dist2BBox → NMS

Stage	Description
SDDetect	Per-scale spiking detection head (regression + classification branches)
DFLDecode	Distribution Focal Loss — decodes discrete distributions into distances
Dist2BBox	Converts distances (top/bottom/left/right) to bounding box coordinates
NMS	Non-Maximum Suppression — filters overlapping detections

SDDetect

detect = ir.SDDetect(
    num_classes=80,
    reg_max=16,
    stride=8,
    cv2_w0=np.random.randn(64, 32, 3, 3).astype(np.float32) * 0.01,
    cv2_b0=np.zeros(64, dtype=np.float32),
    cv2_w1=np.random.randn(64, 64, 3, 3).astype(np.float32) * 0.01,
    cv2_b1=np.zeros(64, dtype=np.float32),
    cv2_w2=np.random.randn(64, 64, 1, 1).astype(np.float32) * 0.01,
    cv2_b2=np.zeros(64, dtype=np.float32),
    cv3_w0=np.random.randn(80, 32, 3, 3).astype(np.float32) * 0.01,
    cv3_b0=np.zeros(80, dtype=np.float32),
    cv3_w1=np.random.randn(80, 80, 3, 3).astype(np.float32) * 0.01,
    cv3_b1=np.zeros(80, dtype=np.float32),
    cv3_w2=np.random.randn(80, 80, 1, 1).astype(np.float32) * 0.01,
    cv3_b2=np.zeros(80, dtype=np.float32),
)
print(f"SDDetect: classes={detect.num_classes}, reg_max={detect.reg_max}, stride={detect.stride}")

Output:

SDDetect: classes=80, reg_max=16, stride=8

DFLDecode, Dist2BBox, NMS

dfl = ir.DFLDecode(reg_max=16)
print(f"DFLDecode: reg_max={dfl.reg_max}")

d2b = ir.Dist2BBox(stride=8)
print(f"Dist2BBox: stride={d2b.stride}")

nms = ir.NMS(iou_threshold=0.45, score_threshold=0.25, max_detections=300)
print(f"NMS: iou={nms.iou_threshold}, score={nms.score_threshold}, max_det={nms.max_detections}")

Output:

DFLDecode: reg_max=16
Dist2BBox: stride=8
NMS: iou=0.45, score=0.25, max_det=300

Full Detection Graph

det_nodes = {
    "features": ir.Input(np.array([32, 80, 80])),
    "detect": detect,
    "dfl": dfl,
    "d2b": d2b,
    "nms": nms,
    "output": ir.Output(np.array([300, 6])),
}

det_edges = [
    ("features", "detect"),
    ("detect", "dfl"),
    ("dfl", "d2b"),
    ("d2b", "nms"),
    ("nms", "output"),
]

det_graph = ir.Graph(nodes=det_nodes, edges=det_edges)
print(f"Detection pipeline graph:")
print(f"  Nodes: {len(det_graph.nodes)}")
print(f"  Edges: {len(det_graph.edges)}")
print(f"  Is DAG: {det_graph.is_dag}")

Output:

Detection pipeline graph:
  Nodes: 6
  Edges: 5
  Is DAG: True

Serialize and Verify

ir.write("models/detection_pipeline.t1c", det_graph)
loaded = ir.read("models/detection_pipeline.t1c")
print(f"Saved & reloaded: {len(loaded.nodes)} nodes, {len(loaded.edges)} edges")

Output:

Saved & reloaded: 6 nodes, 5 edges

7. Building a Mini SU-YOLO Backbone

Combining the ghost primitives into a simplified backbone:

backbone_nodes = {
    "input": ir.Input(np.array([3, 640, 640])),
    "encoder": ir.SGhostEncoderLite(
        conv1_weight=np.random.randn(16, 3, 3, 3).astype(np.float32) * 0.1,
        conv1_bias=np.zeros(16, dtype=np.float32),
        ghost_primary_weight=np.random.randn(16, 16, 1, 1).astype(np.float32) * 0.1,
        ghost_primary_bias=np.zeros(16, dtype=np.float32),
        ghost_cheap_weight=np.random.randn(16, 1, 3, 3).astype(np.float32) * 0.1,
        ghost_cheap_bias=np.zeros(16, dtype=np.float32),
        stride=2,
    ),
    "stage1": ir.GhostBasicBlock1(
        cvres_primary_weight=np.random.randn(8, 32, 1, 1).astype(np.float32) * 0.1,
        cvres_primary_bias=np.zeros(8, dtype=np.float32),
        cvres_cheap_weight=np.random.randn(8, 1, 3, 3).astype(np.float32) * 0.1,
        cvres_cheap_bias=np.zeros(8, dtype=np.float32),
        cv0_primary_weight=np.random.randn(16, 32, 1, 1).astype(np.float32) * 0.1,
        cv0_primary_bias=np.zeros(16, dtype=np.float32),
        cv0_cheap_weight=np.random.randn(16, 1, 3, 3).astype(np.float32) * 0.1,
        cv0_cheap_bias=np.zeros(16, dtype=np.float32),
        cv2_primary_weight=np.random.randn(8, 16, 1, 1).astype(np.float32) * 0.1,
        cv2_primary_bias=np.zeros(8, dtype=np.float32),
        cv2_cheap_weight=np.random.randn(8, 1, 3, 3).astype(np.float32) * 0.1,
        cv2_cheap_bias=np.zeros(8, dtype=np.float32),
        stride=2,
    ),
    "output": ir.Output(np.array([16, 160, 160])),
}

backbone_edges = [
    ("input", "encoder"),
    ("encoder", "stage1"),
    ("stage1", "output"),
]

backbone = ir.Graph(nodes=backbone_nodes, edges=backbone_edges)
print(f"Mini SU-YOLO backbone:")
print(f"  Nodes: {len(backbone.nodes)}")
print(f"  Edges: {len(backbone.edges)}")
for name, node in backbone.nodes.items():
    print(f"  {name}: {type(node).__name__}")

Output:

Mini SU-YOLO backbone:
  Nodes: 4
  Edges: 3
  input: Input
  encoder: SGhostEncoderLite
  stage1: GhostBasicBlock1
  output: Output

Summary

Primitive	Role	Key Parameters
`SGhostConv`	Ghost convolution	`primary_weight`, `cheap_weight`
`GhostBasicBlock1`	Stride-2 backbone block	`cvres_`, `cv0_`, `cv2_*`, `stride=2`
`GhostBasicBlock2`	No-stride FPN block	`cv0_`, `cv2_`
`SGhostEncoderLite`	Stem encoder	`conv1_`, `ghost_`, `stride`
`ChannelSplit`	Split channels	`split_sections`, `dim`
`Concat`	Concatenate channels	`num_inputs`, `dim`
`SDDetect`	Detection head	`num_classes`, `reg_max`, `stride`
`DFLDecode`	DFL distance decode	`reg_max`
`Dist2BBox`	Distance to bbox	`stride`
`NMS`	Non-Maximum Suppression	`iou_threshold`, `score_threshold`, `max_detections`

What's Next

Tutorial: Hardware Mapping — Partition and place on hardware
Tutorial: Backend — Simulate and profile on CPU/FPGA
Primitives Reference — All 36 TALON IR primitives

What You'll Learn​

Prerequisites​

1. SGhostConv​

2. GhostBasicBlock1 (Stride-2 Backbone)​

Channel Constraints​

3. GhostBasicBlock2 (No-Stride FPN Head)​

Channel Constraints​

4. SGhostEncoderLite​

5. ChannelSplit and Concat​

6. Detection Pipeline​

SDDetect​

DFLDecode, Dist2BBox, NMS​

Full Detection Graph​

Serialize and Verify​

7. Building a Mini SU-YOLO Backbone​

Summary​

What's Next​

What You'll Learn

Prerequisites

1. SGhostConv

2. GhostBasicBlock1 (Stride-2 Backbone)

Channel Constraints

3. GhostBasicBlock2 (No-Stride FPN Head)

Channel Constraints

4. SGhostEncoderLite

5. ChannelSplit and Concat

6. Detection Pipeline

SDDetect

DFLDecode, Dist2BBox, NMS

Full Detection Graph

Serialize and Verify

7. Building a Mini SU-YOLO Backbone

Summary

What's Next