# Configuration Loader Internals This document explains *how* the configuration loader is implemented. For the user-facing "what can I configure and where does it come from" perspective, see [`02_configuration_resolution.md`](02_configuration_resolution.md). The two are companions: `02` answers "what do I write in YAML?", this file answers "what does the loader do with what I wrote?". File: `ecu_framework/config/loader.py` ## Pipeline at a glance ```text defaults (dict) └─▶ merge YAML at $ECU_TESTS_CONFIG (if env set & file exists) └─▶ merge YAML at /config/test_config.yaml (if exists) └─▶ merge PSU side-channel (env OWON_PSU_CONFIG or /config/owon_psu.yaml) └─▶ merge in-memory overrides (caller-supplied) └─▶ coerce types & build EcuTestConfig ``` Two layers run sequentially: 1. **Dict layer** — every source contributes a plain `dict`. They are merged with `_deep_update` so nested sections combine key-by-key. 2. **Dataclass layer** — once merged, `_to_dataclass` casts the values to their declared types and constructs `EcuTestConfig`. This is the boundary at which YAML's type fuzziness stops. Keeping the merge in the dict layer (rather than merging dataclasses) makes the precedence story trivial: it's just a sequence of writes into one dict, and the last writer wins. ## Precedence — and why it reads "backwards" The `load_config` docstring lists precedence highest-to-lowest: | Rank | Source | Where in code | |---|---|---| | 1 (highest) | `overrides` dict passed to `load_config` | Applied **last** | | 2 | YAML at `$ECU_TESTS_CONFIG` | Applied if env points at an existing file | | 3 | YAML at `/config/test_config.yaml` | Fallback when env unset | | 4 (lowest) | Built-in defaults | The starting `base` dict | In the implementation, sources are *applied* in reverse order of that table (lowest → highest). That's exactly what "highest precedence" means here: each merge step overwrites earlier values for the same key, so the **last** writer wins. The "1) ... 4)" comments inside `load_config` annotate by precedence rank, not by call order. ## `_deep_update` — the merge semantics ```python def _deep_update(base, updates): for k, v in updates.items(): if isinstance(v, dict) and isinstance(base.get(k), dict): base[k] = _deep_update(base[k], v) else: base[k] = v return base ``` **Rules:** - Dict-on-both-sides → recurse, so nested overlays don't clobber siblings. This is what lets a YAML file override just `interface.bitrate` without re-stating the rest of the `interface` block. - Anything else (scalar, list, mismatched types) → replace wholesale. - **Lists are replaced, not concatenated.** This is deliberate: list-concat semantics surprise users who expect "set this list to X" to mean exactly that. If concatenation is ever needed for a specific field, do it explicitly at the call site, not in the merge primitive. - Mutation happens in place; the return value is the same `base` object, returned for chaining convenience (used when merging the PSU side-channel). ## `_to_dataclass` — defensive type coercion YAML's type inference is generous: `"19200"` (quoted) comes through as a string, `"true"` is not a bool, and hex-keyed mappings may arrive as either int or string keys depending on the YAML writer. Rather than propagate that fuzziness, the loader casts at the dataclass boundary: ```python type=str(iface.get("type", "mock")).lower(), channel=int(iface.get("channel", 1)), bitrate=int(iface.get("bitrate", 19200)), ... ``` Casts that fail raise — and that's the right behavior. A config value that can't be interpreted is a bug to surface early, not silently fall back from. ### Special-case: `frame_lengths` keys `frame_lengths` maps a LIN frame ID (int) to a payload length (int). YAML can write the key as a hex int (`0x0A`), a decimal int (`10`), or a quoted string (`"0x0A"`). Coercion handles all three: ```python key = int(k, 0) if isinstance(k, str) else int(k) ``` `int(k, 0)` with base `0` means "infer from prefix" — `"0x0A"` parses as hex, `"10"` as decimal. Entries that fail to parse are skipped silently rather than aborting the whole load, because one typo in a frame-length map shouldn't prevent the rest of the configuration from coming up. ## PSU side-channel Power-supply settings (COM port, baudrate, IDN substring) are typically **bench-specific** and shouldn't be committed alongside test config. The loader honors a dedicated overlay file just for the `power_supply` section: - `$OWON_PSU_CONFIG` (env var → path) wins, else - `/config/owon_psu.yaml` if it exists. This file is deep-merged into the existing `power_supply` block, so the main YAML can still provide defaults (e.g. `idn_substr: OWON`) while the bench file overrides only the parts that vary by machine. Recommended workflow: ``` config/test_config.yaml # committed; common defaults config/owon_psu.yaml # gitignored; per-bench serial settings ``` ## Dataclass schema quirks ### Forward reference: `EcuTestConfig.power_supply` ```python @dataclass class EcuTestConfig: ... power_supply: "PowerSupplyConfig" = field(default_factory=lambda: PowerSupplyConfig()) @dataclass class PowerSupplyConfig: ... ``` `PowerSupplyConfig` is referenced *before* it is defined. This works because: 1. `from __future__ import annotations` (PEP 563) turns *all* type annotations into strings at module load time, so `"PowerSupplyConfig"` as an annotation never triggers a name lookup. 2. The `default_factory` is a lambda, which defers evaluation of the bare name `PowerSupplyConfig` until `EcuTestConfig()` is actually instantiated — by which point the module body has finished executing and the name is bound. The ordering is intentional: `EcuTestConfig` is the most-used type, so it lives near the top of the file where readers find it first. If you ever drop the `from __future__ import annotations` line, this ordering breaks; the lambda default would still work, but the string annotation would need updating. ### Mutable defaults must use `default_factory` `field(default_factory=dict)` (and `default_factory=InterfaceConfig`, `default_factory=lambda: PowerSupplyConfig()`) is required because Python shares default values across instances by default. Using `field(default={})` on a dataclass field is a `ValueError` at class-creation time — the `default_factory` form is the only correct way. ## Known wart: defaults live in two places The defaults for every field exist twice: 1. As dataclass field defaults — e.g. `type: str = "mock"` on `InterfaceConfig`. 2. As entries in the `base` dict inside `load_config`. Both must agree, and a drift between them would be silently wrong (the loader's defaults would win for the YAML path, while the dataclass defaults would win for callers that construct `InterfaceConfig()` directly). Why it's still this way: the dict is needed because `_deep_update` operates on dicts; the dataclass defaults are needed because callers may construct configs directly without going through `load_config`. If a third construction path appears, extract defaults to a single `DEFAULTS` mapping that both layers read from. ## Test surface Unit tests live in `tests/unit/test_config_loader.py`. They cover the override precedence chain and the dataclass-construction defaults. When adding a new field, add at minimum: 1. The dataclass field with a default. 2. The matching default in the `base` dict in `load_config`. 3. The matching cast line in `_to_dataclass`. 4. A unit test asserting it round-trips through `load_config(overrides=...)`. Skipping (3) is the most common bug — the field will appear to work because the dataclass default carries it, but YAML/env overlays for that field will be silently dropped.