Tutorial: Backend — Simulate, Profile & FPGA

The talon.backend (talon.backend) provides CPU simulation, energy/latency profiling with device-specific presets, and FPGA compilation via hls4ml.

What You'll Learn

Backend registry and capabilities
Validate a graph against a backend
Compile for hardware
Simulate on CPU with spike tracking
Profile energy, latency, and memory with device presets
FPGA compilation and bitstream generation

Prerequisites

pip install t1c-talon

1. Backend Registry

TALON provides pluggable backends. The built-in backends are cpu and fpga.

from talon import backend

print(f"Available backends: {backend.list_backends()}")

cpu = backend.get_backend("cpu")
print(f"\nCPU backend: {cpu.name}")

caps = cpu.get_capabilities()
print(f"  Supported primitives: {len(caps.supported_primitives)}")
print(f"  Supported precisions: {caps.supported_precisions}")
print(f"  Supports cyclic: {caps.supports_cyclic}")
print(f"  Max graph size: {caps.max_graph_size:,}")

Output:

Available backends: ['cpu', 'fpga']

CPU backend: cpu
  Supported primitives: 22
  Supported precisions: ['fp32', 'fp16', 'int16', 'int8', 'int4']
  Supports cyclic: True
  Max graph size: 1,000,000

2. Energy Presets

TALON includes energy presets for different hardware targets. All energy values are in picojoules (pJ) per operation.

print("Energy presets:")
for name, preset in backend.ENERGY_PRESETS.items():
    print(f"  {name}:")
    print(f"    MAC:  {preset['mac_pj']} pJ")
    print(f"    Spike: {preset['spike_pj']} pJ")
    print(f"    {preset['description']}")

Output:

Energy presets:
  45nm_cmos:
    MAC:  0.9 pJ
    Spike: 0.1 pJ
    Standard 45 nm CMOS process (Horowitz 2014)
  28nm_cmos:
    MAC:  0.5 pJ
    Spike: 0.06 pJ
    28 nm CMOS process
  neuromorphic_int8:
    MAC:  4.6 pJ
    Spike: 0.9 pJ
    Representative INT8 neuromorphic accelerator
  zynq_us_plus:
    MAC:  3.2 pJ
    Spike: 0.7 pJ
    Zynq UltraScale+ XCZU9EG on ZCU102 (16 nm FinFET, 274K LUTs, 2520 DSP48, 32.1 Mb BRAM)
  zcu102:
    MAC:  3.2 pJ
    Spike: 0.7 pJ
    Alias for zynq_us_plus (XCZU9EG-2FFVB1156 on ZCU102)
  zynq_7020:
    MAC:  6.5 pJ
    Spike: 1.2 pJ
    Xilinx Zynq-7020 FPGA (28 nm)

3. Build a Test Graph

import numpy as np
from talon import ir

nodes = {
    "input": ir.Input(np.array([784])),
    "fc1": ir.Affine(
        weight=np.random.randn(128, 784).astype(np.float32) * 0.01,
        bias=np.zeros(128, dtype=np.float32),
    ),
    "lif1": ir.LIF(
        tau=np.ones(128, dtype=np.float32) * 10.0,
        r=np.ones(128, dtype=np.float32),
        v_leak=np.zeros(128, dtype=np.float32),
        v_threshold=np.ones(128, dtype=np.float32),
    ),
    "fc2": ir.Affine(
        weight=np.random.randn(10, 128).astype(np.float32) * 0.01,
        bias=np.zeros(10, dtype=np.float32),
    ),
    "lif2": ir.LIF(
        tau=np.ones(10, dtype=np.float32) * 10.0,
        r=np.ones(10, dtype=np.float32),
        v_leak=np.zeros(10, dtype=np.float32),
        v_threshold=np.ones(10, dtype=np.float32),
    ),
    "output": ir.Output(np.array([10])),
}

edges = [
    ("input", "fc1"), ("fc1", "lif1"), ("lif1", "fc2"),
    ("fc2", "lif2"), ("lif2", "output"),
]

graph = ir.Graph(nodes=nodes, edges=edges)
print(f"Graph: {len(graph.nodes)} nodes, {len(graph.edges)} edges")

Output:

Graph: 6 nodes, 5 edges

4. Validate

Check whether all nodes in the graph are supported by the backend.

val = cpu.validate(graph)

print(f"Validation:")
print(f"  Valid: {val.is_valid}")
print(f"  Unsupported nodes: {val.unsupported_nodes}")
print(f"  Warnings: {val.warnings}")

Output:

Validation:
  Valid: True
  Unsupported nodes: []
  Warnings: []

5. Compile

Compile produces a HardwareDescriptor containing the execution plan.

config = backend.CompileConfig()
compiled = cpu.compile(graph, config)
print(f"Compiled: {type(compiled).__name__}")

Output:

Compiled: HardwareDescriptor

6. Simulate

Run the graph on CPU with spike state tracking across timesteps.

x = np.random.randn(1, 784).astype(np.float32)
sim = cpu.simulate(graph, {"input": x}, n_steps=3)

print(f"Simulation:")
print(f"  Steps run: {sim.timesteps_run}")
print(f"  Outputs valid: {sim.outputs_valid}")
print(f"  Spike counts: {sim.spike_counts}")

Output:

Simulation:
  Steps run: 3
  Outputs valid: True
  Spike counts: {'lif1': 15, 'lif2': 0}

Step-by-Step Mode

Use step_by_step=True to get per-timestep intermediate states in sim.node_states.

7. Profile

Profile computes estimated latency, energy, and memory usage. All energy values are labeled as estimated — they use analytical models, not measured hardware power.

Default Preset (neuromorphic_int8)

prof = cpu.profile(graph)

print(f"Profile (neuromorphic_int8):")
print(f"  Total latency: {prof.total_latency_us:.2f} μs")
print(f"  MAC operations: {prof.mac_ops:,}")
print(f"  Energy estimate: {prof.energy_estimate_uj:.4f} μJ")
print(f"    MAC energy: {prof.mac_energy_uj:.4f} μJ")
print(f"    Spike energy: {prof.spike_energy_uj:.4f} μJ")
print(f"    SRAM energy: {prof.sram_energy_uj:.4f} μJ")
print(f"  Peak memory: {prof.peak_memory_bytes:,} bytes")
print(f"  Weight memory: {prof.weight_memory_bytes:,} bytes")
print(f"  Energy preset: {prof.energy_preset}")

print(f"\n  Per-layer energy:")
for layer, energy in prof.per_layer_energy_uj.items():
    print(f"    {layer}: {energy:.4f} μJ")

Output:

Profile (neuromorphic_int8):
  Total latency: 54.58 μs
  MAC operations: 101,632
  Energy estimate: 0.9777 μJ
    MAC energy: 0.4675 μJ
    Spike energy: 0.0000 μJ
    SRAM energy: 0.5102 μJ
  Peak memory: 413,568 bytes
  Weight memory: 407,080 bytes
  Energy preset: neuromorphic_int8

  Per-layer energy:
    fc1: 0.9640 μJ
    lif1: 0.0013 μJ
    fc2: 0.0123 μJ
    lif2: 0.0001 μJ

ZCU102 Preset

prof_zcu = cpu.profile(graph, energy_preset="zcu102")

print(f"Profile (ZCU102):")
print(f"  Energy preset: {prof_zcu.energy_preset}")
print(f"  Energy estimate: {prof_zcu.energy_estimate_uj:.4f} μJ")

Output:

Profile (ZCU102):
  Energy preset: zcu102
  Energy estimate: 0.7334 μJ

Custom Energy Parameters

prof_custom = cpu.profile(
    graph,
    energy_per_mac_pj=2.0,
    energy_per_spike_pj=0.5,
    energy_per_sram_read_pj=3.0,
)
print(f"Custom preset energy: {prof_custom.energy_estimate_uj:.4f} μJ")

8. FPGA Backend

The FPGA backend generates hls4ml-compatible configurations for Xilinx FPGAs.

Validate for FPGA

fpga = backend.get_backend("fpga")
val = fpga.validate(graph)
print(f"FPGA validation: {val.is_valid}")
print(f"  Supported precisions: {fpga.get_capabilities().supported_precisions}")

Output:

FPGA validation: True
  Supported precisions: ['fp32', 'int16', 'int8', 'int4', 'ap_fixed<16,6>']

Compile for FPGA

descriptor = fpga.compile(
    graph,
    output_dir="hls4ml_project",
    project_name="snn_demo",
    part="xczu9eg-ffvb1156-2-i",
    clock_period=5,
    default_precision="ap_fixed<16,6>",
    default_reuse_factor=1,
    strategy="Latency",
)
print(f"FPGA compiled: {type(descriptor).__name__}")

Output:

FPGA compiled: HardwareDescriptor

Build Bitstream Project

result = fpga.build_bitstream(
    graph,
    output_dir="./hls4ml_project",
    project_name="snn_demo",
    part="xczu9eg-ffvb1156-2-i",
    clock_period=5,
    default_precision="ap_fixed<16,6>",
    default_reuse_factor=1,
    strategy="Latency",
    bitstream_config=backend.BitstreamConfig(
        csim=True,
        synth=False,
        cosim=False,
        export=False,
        vsynth=False,
        bitfile=False,
    ),
    write_to_disk=True,
)

print(f"Generated files: {list(result.keys())}")

Two-Phase FPGA Flow

SDK generates configs, templates (LIF/IF headers, parameters.h, build.py, build_prj.tcl)
hls4ml creates the actual HLS project from those configs

hls4ml is an optional dependency — install it separately when targeting FPGA synthesis.

API Reference

Function / Class	Purpose
`backend.list_backends()`	List available backends
`backend.get_backend(name)`	Get a backend instance
`backend.ENERGY_PRESETS`	Energy preset configurations
`cpu.validate(graph)`	Check graph compatibility
`cpu.compile(graph, config)`	Compile to hardware descriptor
`cpu.simulate(graph, inputs, n_steps)`	Run CPU simulation
`cpu.profile(graph, energy_preset)`	Profile energy, latency, memory
`fpga.validate(graph)`	Check FPGA compatibility
`fpga.compile(graph, ...)`	Compile for FPGA
`fpga.build_bitstream(graph, ...)`	Generate hls4ml project files
`BitstreamConfig(...)`	Configure bitstream generation steps

What's Next

Tutorial: Event I/O — Neuromorphic event data handling
Tutorial: Hardware Mapping — Partition, route, and place
SDK Reference — Full SDK API documentation

What You'll Learn​

Prerequisites​

1. Backend Registry​

2. Energy Presets​

3. Build a Test Graph​

4. Validate​

5. Compile​

6. Simulate​

7. Profile​

Default Preset (neuromorphic_int8)​

ZCU102 Preset​

Custom Energy Parameters​

8. FPGA Backend​

Validate for FPGA​

Compile for FPGA​

Build Bitstream Project​

API Reference​

What's Next​

What You'll Learn

Prerequisites

1. Backend Registry

2. Energy Presets

3. Build a Test Graph

4. Validate

5. Compile

6. Simulate

7. Profile

Default Preset (neuromorphic_int8)

ZCU102 Preset

Custom Energy Parameters

8. FPGA Backend

Validate for FPGA

Compile for FPGA

Build Bitstream Project

API Reference

What's Next