Chapter 11: Advanced Core Semantics

Datatype Performance and Correctness

Choosing between C++ types, SystemC integer types, bit vectors, logic vectors, fixed-point types, and TLM byte arrays.

Datatype Performance and Correctness

SystemC provides an extensive library of custom datatypes because hardware modeling requires precise bit widths, four-state logic, and fixed-point arithmetic. However, a common beginner trap is using the most "hardware-looking" type everywhere. This drastically reduces simulation performance and makes the C++ code cumbersome to read.

The IEEE 1666 LRM strictly defines these datatypes. Knowing when to use native C++ types versus SystemC types is a hallmark of an expert SystemC architect.

The LRM Datatype Categories

The standard defines several datatype groups under the sc_dt namespace:

  1. Native C++ Types: (int, uint32_t, bool) Performance: Maximum. Use Case: Virtual Platform (TLM) internal state, counters, flags, memory arrays.
  2. Limited-Precision Fixed-Width Integers: (sc_dt::sc_int<W>, sc_dt::sc_uint<W>) Performance: High (implemented using 64-bit native integers under the hood). Valid for $W \le 64$. Use Case: Register fields, exact small hardware width arithmetic.
  3. Arbitrary-Precision Integers: (sc_dt::sc_bigint<W>, sc_dt::sc_biguint<W>) Performance: Slow (dynamically allocates arrays of words). Valid for $W > 64$. Use Case: Cryptographic keys, very wide buses, wide memory payloads.
  4. Bit and Logic Vectors: (sc_dt::sc_bv<W>, sc_dt::sc_lv<W>) Performance: Very Slow (uses proxy objects, stores bit arrays, resolves 4-state logic for sc_lv). Use Case: Pin-level RTL interfaces, unknown ('X') or high impedance ('Z') states.
  5. Fixed-Point Types: (sc_dt::sc_fixed, sc_dt::sc_ufixed) Performance: Moderate to Slow (handles quantization and overflow). Use Case: DSP algorithms, AMS (Analog Mixed Signal) boundaries.

The Proxy Object Problem

A major performance pitfall in SystemC datatypes is the use of proxy classes for bit-selection ([]) and part-selection (range()).

When you write reg.range(15, 8), SystemC does not return an integer. It returns a temporary proxy object (sc_dt::sc_subref). If you nest these deeply, the C++ compiler generates massive amounts of temporary proxy objects, severely degrading simulation speed.

Best Practice: Convert to native C++ types for complex arithmetic, then assign back to SystemC types only at the module boundaries.

Complete Example: Datatype Trade-offs

This complete sc_main example demonstrates how to correctly mix native C++ types with SystemC limited-precision integers, and how to use part-select proxies safely.

#include <systemc>
#include <iostream>
#include <iomanip>
 
SC_MODULE(DatatypeDemo) {
    // Port using exact-width hardware type
    sc_core::sc_in<sc_dt::sc_uint<12>> address_in{"address_in"};
    
    // Internal state using fast native C++ type (Best Practice for VPs)
    uint32_t internal_memory[4096];
 
    // Hardware-accurate register representing a 32-bit control register
    sc_dt::sc_uint<32> control_reg;
 
    SC_CTOR(DatatypeDemo) {
        SC_METHOD(process_transaction);
        sensitive << address_in;
        dont_initialize();
 
        // Initialize memory
        for (int i = 0; i < 4096; i++) internal_memory[i] = 0;
        control_reg = 0;
    }
 
    void process_transaction() {
        // 1. Read from SystemC type to native C++ type (Fast)
        uint32_t addr = address_in.read();
        
        // 2. Perform operations using native C++ (Fast)
        if (addr < 4096) {
            internal_memory[addr] = 0xDEADBEEF;
        }
 
        // 3. Using SystemC Proxy Objects (range) correctly
        // Extracting bits [11:8] as a 4-bit unsigned integer
        sc_dt::sc_uint<4> page = address_in.read().range(11, 8);
        
        // Packing bits into the control register
        // Avoid deep nesting: reg.range() = (a, b);
        control_reg.range(3, 0) = page;
        control_reg.range(31, 28) = 0xF;
 
        std::cout << "@ " << sc_core::sc_time_stamp() 
                  << " Addr: 0x" << std::hex << addr 
                  << " Page: 0x" << page
                  << " Control Reg: 0x" << control_reg << "\n";
    }
};
 
// Testbench to drive the module
SC_MODULE(Testbench) {
    sc_core::sc_signal<sc_dt::sc_uint<12>> addr_sig{"addr_sig"};
    DatatypeDemo* demo;
 
    SC_CTOR(Testbench) {
        demo = new DatatypeDemo("demo_inst");
        demo->address_in(addr_sig);
 
        SC_THREAD(drive);
    }
 
    void drive() {
        wait(10, sc_core::SC_NS);
        addr_sig.write(0x0A4); // Write 12-bit value
        
        wait(10, sc_core::SC_NS);
        addr_sig.write(0xF00);
    }
 
    ~Testbench() {
        delete demo;
    }
};
 
int sc_main(int argc, char* argv[]) {
    Testbench tb("tb");
    
    std::cout << "Starting simulation...\n";
    sc_core::sc_start(50, sc_core::SC_NS);
    
    return 0;
}

Explanation of the Execution

When run, the output shows:

Starting simulation...
@ 10 ns Addr: 0xa4 Page: 0x0 Control Reg: 0xf0000000
@ 20 ns Addr: 0xf00 Page: 0xf Control Reg: 0xf000000f

Notice how address_in.read().range(11, 8) correctly extracts the top 4 bits of the 12-bit address. When driving 0xF00, the top nibble is F, which is packed into the lowest 4 bits of the 32-bit control_reg.

Using uint32_t for the internal_memory ensures that the simulation runs at native C++ speeds for the bulk of the data storage, while sc_dt::sc_uint is reserved for explicit hardware boundaries.

Comments and Corrections