Deep Dive: Fixed-Point Datatypes (sc_fixed) and Quantization
Master the IEEE 1666-2023 fixed-point types, exploring quantization modes, overflow mechanics, and the underlying math of sc_fixed and sc_ufixed.
How to Read This Lesson
These core semantics are where experienced SystemC engineers earn their calm. We will name the scheduler rule, then show how the source enforces it.
Deep Dive: Fixed-Point Datatypes (sc_fixed) and Quantization
In hardware design, floating-point arithmetic (like C++ float or double) is typically avoided due to massive silicon area, high power consumption, and long propagation delays. Instead, designers rely on Fixed-Point Arithmetic. The IEEE 1666 standard provides dedicated classes for this: sc_fixed (signed) and sc_ufixed (unsigned), along with their fast equivalents sc_fixed_fast and sc_ufixed_fast.
This tutorial breaks down the anatomy of a fixed-point number, explores the detailed quantization (sc_q_mode) and overflow (sc_o_mode) mechanics, and examines the Accellera source code to understand the performance overhead of these types.
Source and LRM Trail
Advanced core behavior should always be checked against Docs/LRMs/SystemC_LRM_1666-2023.pdf before source details. For implementation, read .codex-src/systemc/src/sysc/kernel and .codex-src/systemc/src/sysc/communication, especially the scheduler, events, object hierarchy, writer policy, report handler, and async update path.
Anatomy of a Fixed-Point Type
To use a fixed-point type, you must define its geometry:
sc_fixed<wl, iwl, q_mode, o_mode, n_bits>wl(Word Length): Total number of bits.iwl(Integer Word Length): Number of bits located to the left of the binary point (including the sign bit for signed types).q_mode(Quantization Mode): How to handle bits that are discarded on the right (fractional bits) when casting to a type with fewer fractional bits.o_mode(Overflow Mode): How to handle bits that overflow on the left (integer bits) when casting to a type with fewer integer bits.n_bits: Number of saturated bits (only relevant for certain overflow modes).
The number of fractional bits is simply wl - iwl. Note that iwl can be greater than wl (implying trailing zeros) or negative (implying leading fractional zeros), though typical use cases have 0 < iwl <= wl.
Source Code Mechanics: sc_fxnum vs sc_fxval
If you read the Accellera source code in sysc/datatypes/fx/, you will see that sc_fixed is merely a template wrapper around the base class sc_fxnum.
When you perform arithmetic on an sc_fixed, the kernel does not operate directly on the bit array. Instead, the operands are converted into an intermediate representation called sc_fxval.
sc_fxvaldynamically allocates an array of 32-bit words (m_rep) to hold an arbitrary-precision mantissa, along with an exponent.- The arithmetic operation (addition, multiplication) is performed in this high-precision
sc_fxvalspace. - The result is then cast back into the target
sc_fxnum. During this cast, the target'ssc_fxtype_paramsare applied, which triggers the Quantization (sc_q_mode) and Overflow (sc_o_mode) logic.
Furthermore, every sc_fxnum contains a pointer to an sc_fxnum_observer. This is a design pattern used to notify VCD waveform tracers whenever the fixed-point value changes, which adds memory overhead to every single fixed-point variable.
Quantization Modes (sc_q_mode)
When you assign a highly precise number to a less precise fixed-point variable, you lose fractional bits. Quantization defines how that loss is handled in the sc_fxval to sc_fxnum cast:
SC_TRN(Truncation): Default. Simply chops off the extra bits. This approaches negative infinity.SC_TRN_ZERO: Truncates towards zero.SC_RND(Round to positive infinity): Adds 0.5 to the LSB being kept, carrying over if needed.SC_RND_ZERO: Rounds towards zero.SC_RND_MIN_INF: Rounds towards negative infinity.SC_RND_INF: Rounds away from zero.SC_RND_CONV(Convergent Rounding / Banker's Rounding): Rounds to the nearest even number if exactly halfway. Minimizes statistical bias in DSP algorithms.
Overflow Modes (sc_o_mode)
When an assignment exceeds the maximum representable value (integer bits are lost), overflow handling determines the outcome:
SC_WRAP(Wrap-around): Default. The bits simply roll over, ignoring the lost MSBs.SC_WRAP_SM: Wrap-around with Sign Magnitude representation.SC_SAT(Saturation): Clips to the maximum positive or negative representable value. Crucial for DSP (e.g., audio doesn't flip from loud positive to loud negative, it just distorts softly).SC_SAT_ZERO: Clips to zero on overflow.SC_SAT_SYM: Symmetrical saturation. (e.g., if max is 7, min is -7 instead of -8).
The Fast Types (sc_fixed_fast)
The standard types (sc_fixed) use arbitrary-precision arithmetic internally (sc_fxval), which relies on heap allocations and loops.
If your wl is less than or equal to 53 bits (the mantissa size of a standard double), you should use sc_fixed_fast and sc_ufixed_fast.
In the Accellera kernel, sc_fixed_fast derives from sc_fxnum_fast. Arithmetic operations convert operands to sc_fxval_fast, which is backed directly by a native C++ double (m_val). This bypasses all array allocations, delegating the math directly to your CPU's FPU, resulting in massive simulation speedups while retaining bit-accurate semantics during the final assignment cast.
End-to-End Example: DSP Accumulator
Here is a complete sc_main example demonstrates how truncation and saturation affect signal processing values.
#define SC_INCLUDE_FX // Required to include fixed-point headers
#include <systemc>
#include <iostream>
#include <iomanip>
int sc_main(int argc, char* argv[]) {
// Suppress default SystemC info messages
sc_core::sc_report_handler::set_actions("/IEEE_Std_1666/deprecated", sc_core::SC_DO_NOTHING);
std::cout << "--- SystemC Fixed-Point Tutorial ---" << std::endl;
// 1. Basic Declaration
// wl = 8, iwl = 4 -> 4 integer bits, 4 fractional bits.
// Signed type, so range is [-8.0, 7.9375]
sc_dt::sc_fixed<8, 4> basic_val;
basic_val = 3.5;
std::cout << "Basic Value: " << basic_val << std::endl;
// 2. Exploring Quantization (Rounding vs Truncation)
// Source number needs high precision
sc_dt::sc_fixed<16, 4> high_prec = 2.6875; // 0010.1011
// Target: only 2 fractional bits.
// Truncation (Default)
sc_dt::sc_fixed<6, 4, sc_dt::SC_TRN> trn_val = high_prec;
// Rounding (Adds to LSB)
sc_dt::sc_fixed<6, 4, sc_dt::SC_RND> rnd_val = high_prec;
std::cout << "\n--- Quantization ---" << std::endl;
std::cout << "Original (16,4): " << high_prec << std::endl;
std::cout << "Truncated (6,4) : " << trn_val << " (Lost precision)" << std::endl;
std::cout << "Rounded (6,4) : " << rnd_val << " (Rounded up)" << std::endl;
// 3. Exploring Overflow (Wrap vs Saturation)
// Source number needs high integer range
sc_dt::sc_fixed<8, 8> large_val = 14;
// Target: only 3 integer bits (signed, range [-4, 3])
// Wrap (Default)
sc_dt::sc_fixed<5, 3, sc_dt::SC_TRN, sc_dt::SC_WRAP> wrap_val = large_val;
// Saturation
sc_dt::sc_fixed<5, 3, sc_dt::SC_TRN, sc_dt::SC_SAT> sat_val = large_val;
std::cout << "\n--- Overflow ---" << std::endl;
std::cout << "Original (8,8): " << large_val << std::endl;
std::cout << "Wrapped (5,3): " << wrap_val << " (Rolled over)" << std::endl;
std::cout << "Saturated (5,3): " << sat_val << " (Clipped to max positive)" << std::endl;
// 4. Bit-level Introspection
std::cout << "\n--- Bit-level introspection ---" << std::endl;
// sc_fixed allows reading/writing individual bits using []
// Bits are indexed from 0 (LSB) to wl-1 (MSB)
sc_dt::sc_fixed<4, 4> mask = 5; // Binary 0101
std::cout << "Value: " << mask << ", Binary: ";
for (int i = 3; i >= 0; --i) {
std::cout << mask[i];
}
std::cout << std::endl;
// Flip the MSB (Sign bit)
mask[3] = 1;
std::cout << "After flipping MSB: " << mask << std::endl;
return 0;
}LRM Strictness: #define SC_INCLUDE_FX
By default, the SystemC header (#include <systemc>) does not include the fixed-point library. The fixed-point headers bring a massive amount of template code into your translation unit, drastically slowing down compilation.
The LRM specifies that you must define the macro SC_INCLUDE_FX before including <systemc> in any file that uses fixed-point types.
#define SC_INCLUDE_FX
#include <systemc>If you forget this, you will receive "type not declared" errors from the compiler for sc_fixed.
Conclusion
Understanding sc_fixed and sc_q_mode/sc_o_mode is critical for designing DSP algorithms, Neural Network accelerators, and modem pipelines in SystemC. By utilizing bit-accurate datatypes and favoring sc_fixed_fast where applicable, you achieve a perfect balance between hardware accuracy and simulation speed.
Comments and Corrections