WCET for ML Inference: Techniques, Metrics, and How RocqStat Helps Auditable Timing Guarantees
Explain WCET for ML inference on ECUs, why determinism matters in 2026, and how RocqStat enables auditable timing guarantees for safety-critical systems.
Hook: When average latency isn’t good enough
Automotive and industrial teams increasingly deploy machine learning on constrained ECUs and microcontrollers to meet latency, cost, and safety targets. But when a neural-network inference occasionally runs long, a lane-keeping or safety monitor can miss its deadline — and an intermittent timing failure is still a failure. For teams facing slow, brittle environment setups and opaque timing evidence, the question isn’t how fast the model is on average — it’s what is the worst-case execution time (WCET) and can we audibly prove it to certifiers and operators?
The most important points up front
- WCET is the single metric that matters for real-time safety. For safety-critical ML on MCUs/ECUs, you must bound the maximum inference time under all relevant inputs and system conditions.
- ML inference on microcontrollers changes the timing problem. Data-dependent operators, flexible runtimes, caches, DMA, interrupts, and multicore interference make WCET hard to compute.
- Practical timing assurance uses a hybrid of static analysis, measurement, and compositional verification. Modern tools like RocqStat (now part of Vector’s tool chain) bring these techniques together to produce auditable timing guarantees and traceable evidence for ISO 26262/industry compliance.
Why WCET matters for ML inference in 2026
In 2026 the software-defined vehicle and industrial edge ecosystems are more ML-driven than ever: sensor fusion and perception stacks, driver monitoring, predictive maintenance, and control assist run on heterogeneous ECUs and MCU-class devices. Regulatory scrutiny and the move toward end-to-end verification mean teams cannot rely on average latency; they need deterministic timing proofs that stand up in audits.
The January 2026 acquisition of StatInf’s RocqStat by Vector underscores this shift: tool vendors are embedding WCET analysis directly into verification toolchains to provide unified, auditable workflows. Expect more integrations (VectorCAST with RocqStat being the first public example) and tighter requirements for demonstrable timing evidence across CI/CD pipelines and certification artifacts.
Key timing metrics for ML inference
- WCET — worst-case execution time of an inference on a given hardware and software configuration.
- BCET — best-case execution time. Useful for slack analysis but irrelevant for safety guarantees.
- AET / median / percentiles — average and distributional metrics that help cost/performance tradeoffs but not safety.
- Jitter — variability in latency across invocations; bounded jitter is essential for deterministic scheduling.
- Tail latency — the 99.9th percentile and similar metrics help identify pathological inputs but need bridging to WCET proofs.
Why ML breaks classical WCET assumptions
Traditional WCET analysis assumed small, predictable control-flow programs. ML inference on MCUs introduces several complications:
- Data-dependent operators: Attention, dynamic interpolation, conditional blocks, or variable-length inputs make static path enumeration difficult.
- Non-trivial microarchitecture effects: caches, branch predictors, prefetch, and speculative execution can create huge timing variance on modern embedded processors.
- Heterogeneous accelerators: Dedicated ML engines, DSPs, and NPUs bring their own scheduling and memory semantics; the host-accelerator interactions (DMA, command queues) must be modeled.
- Undetermined runtime layers: Some inference runtimes (even micro-runtimes) perform Just-In-Time decisions like operator fusion or threading, which can increase nondeterminism.
- Interrupts and background tasks: Real systems have ISRs and housekeeping tasks that preempt inference unless explicitly bounded.
Techniques to obtain auditable WCET for ML inference
No single technique is sufficient for modern embedded ML. A practical verification strategy combines several complementary approaches.
1) Static timing analysis with microarchitecture models
Static WCET analyzers reason about all possible execution paths and use models of the target CPU and caches to produce safe upper bounds. For ML workloads, static analysis benefits from:
- Operator-level flow graphs generated from compiled binaries or from the inference graph (ONNX, TFLite).
- Microarchitecture descriptions (cache sizes, associativity, TCM vs. cache, pipeline stages) so bounds are conservative but useful.
2) Measurement-based testing with worst-case input hunting
Measurements on-target are indispensable. Use systematic stress tests and adversarial input generation to explore inputs that push timing to extremes:
- Fuzz the input space with constraints to find pathological sequences or activation patterns.
- Use black-box hill-climbing and directed mutation to maximize latency metrics.
- Capture hardware traces (PMU counters, ETM traces) to understand microarchitectural causes.
- For real-world measurement setups see our notes on stress tests and targeted instrumentation to capture worst-case events.
3) Hybrid (measurement-augmented) analysis
Hybrid approaches combine static analysis with measured upper bounds to tighten estimates: you use measurements to validate microarchitecture models and update conservative assumptions.
4) Compositional analysis and per-operator WCET
Break the inference pipeline into operators (convolutions, matmuls, activation, softmax). Establish safe WCETs for each operator, then compose them using worst-case addition or parametric formulas. Compositional proofs are easier to audit and re-use when the model changes slightly.
5) Scheduling and response-time analysis (RTA)
Once the per-inference WCET is known, integrate it into system-level schedulability analysis using classical RTA (for fixed-priority systems) or EDF analysis. For multicore MCUs, include interference models or use temporal isolation techniques.
6) Design rules for timing determinism
Engineers can make the analysis tractable by constraining implementation choices:
- Avoid dynamic memory allocations during inference — pre-allocate buffers deterministically.
- Prefer fixed-size inputs or pad sequences to known lengths.
- Use quantized, fused operators that minimize branching and memory traffic.
- Lock caches or use scratchpad/TCM for critical data to eliminate cache-eviction variability.
- Disable DVFS and non-essential peripherals during timing-critical windows or account for their effect in the model.
Practical checklist: From model to auditable WCET
- Pin your HW/SW configuration. Compiler flags, runtime version, kernel/RTOS config, and microarchitecture model must be fixed for the proof.
- Extract operator graph. Export the compiled inference: ONNX / TFLite / compiled binary with symbol mapping.
- Instrument and measure. Run targeted on-device measurements across temperature, clock, and power domains to get empirical upper bounds.
- Run static WCET analysis. Use a tool that models pipeline and cache effects to enumerate worst-case paths or compute conservative bounds.
- Compose and schedule. Convert per-inference WCET into task deadlines and verify system-level schedulability under chosen policy (FP, EDF).
- Produce traceable evidence. Store inputs, trace logs, configuration manifests, and formal reports in the verification artifacts for auditors.
Code example: Composing per-operator WCETs (pseudo-code)
// Example: compute compositional worst-case inference time
function compute_inference_wcet(operators, operator_wcet_map) {
worst_case = 0
for op in operators {
op_wcet = operator_wcet_map[op.name]
// add op-specific parametric cost, e.g., multiply by dynamic size
op_wcet = op_wcet * op.param_upper_bound
worst_case += op_wcet
}
// add DMA and ISR interference margins
worst_case += system_interference_margin()
return worst_case
}
Handling multicore and accelerator interference
Multicore ECUs and devices with accelerators complicate WCET, because memory contention and shared buses create cross-core interference. Strategies to bound effects include:
- Temporal isolation: Use time-division multiplexing or RTOS mechanisms to isolate ML tasks.
- Hardware partitioning: Reserve a core or an accelerator instance for safety-critical models.
- Interference modelling: Extend static analysis with shared-bus contention models and validated upper bounds for DMA contention.
How RocqStat fits into an auditable timing workflow
RocqStat — the WCET analysis product developed by StatInf and acquired by Vector in January 2026 — brings enterprise-grade features that match the needs above. Its role in a verification pipeline looks like this:
- Binary-aware static WCET estimation. RocqStat accepts compiled binaries and models the target microarchitecture to compute conservative upper bounds for tasks, including inference code compiled from NN runtimes or static kernels.
- Hybrid verification. It supports combining static analyses with measurement traces to refine models while preserving safety margins — useful for ML where microarchitectural assumptions can be calibrated with tests.
- Compositional and modular proofs. RocqStat enables per-module and per-operator timing contracts so teams can re-verify subcomponents (e.g., a quantized convolution kernel) without re-running a full WCET pass for the whole system.
- Traceability and reporting for audits. The tool produces machine-readable reports, evidence bundles (input vectors, PMU traces, configuration manifests), and human-readable justification — all essential for ISO 26262 and supplier audits.
- Integration-ready for CI/CD and verification toolchains. With the Vector acquisition, RocqStat’s analysis can now be embedded in VectorCAST and automated verification flows — enabling continuous timing proof for every build, an emerging requirement in 2026 automotive development.
What RocqStat helps you achieve — concretely
- Produce safe, auditable upper bounds for ML inference on ECU-class hardware.
- Reduce manual effort by re-using per-operator WCET proofs after model changes.
- Integrate WCET checks into CI to prevent regressions that could violate deadlines.
- Generate artifacts that support functional-safety deliverables (e.g., timing analysis reports aligned with ISO 26262 plans).
Practical advice: When to accept measured upper bounds vs. demand formal proofs
Not every ML workload needs a formal mathematical proof of WCET. Use this rule-of-thumb:
- Hard real-time safety function (SIL/ASIL-bound): Require formal WCET proofs with traceable artifacts and tool-supported analysis (use static+hybrid tooling such as RocqStat integrated into verification workflows).
- Best-effort or monitoring functions: Use rigorous measurement campaigns plus safety margins, but document the limitations and monitor in-field.
2026 trends that change how teams approach WCET for ML
Several trends in 2025–2026 shape the way teams must approach timing analysis:
- Consolidation of verification toolchains. The Vector acquisition of RocqStat signals consolidation: expect WCET, software testing, and static verification to be delivered together and embedded into automotive supplier workflows.
- Heterogeneous microcontroller platforms. RISC-V cores with tightly coupled ML accelerators and proprietary NPUs are common. Analysts must model accelerator semantics, DMA patterns, and shared memory unpredictability.
- Regulatory focus on ML transparency and timing evidence. Auditors increasingly expect determinism evidence for ML-in-the-loop functions — not just run-time monitors.
- Edge-first ML toolchains mature. Deterministic micro-runtimes (ONNX Runtime Micro determinism modes, Arm's Ethos deterministic profiles) reduce runtime nondeterminism but still require WCET proofs at the system level.
Case study (anonymized): 10x faster verification cycle with compositional WCET
A Tier-1 supplier deployed a quantized perception model on a dual-core ECU with a small NPU. Initially, full-system WCET runs took days and required repeated manual calibration. By refactoring the stack to produce operator contracts and using hybrid static-measurement runs with a RocqStat-based flow, the supplier reduced per-build verification time from days to hours. They also gained a reusable evidence bundle for certification that separated ML operator upgrades from system scheduling re-verification.
Limitations and pitfalls — what to watch for
- Overfitting models to timing tests. Don’t optimise only for a measured worst-case that could be invalidated by slight hardware changes or compiler updates. Keep proofs tied to configuration manifests.
- Hidden non-determinism. Background diagnostics, logging, or watchdogs can add latency — ensure these are included in the analysis or disabled during the critical window.
- Toolchain drift. Small changes in compiler optimizations can change WCETs. Always version-control toolchain and runtime artifacts and run WCET checks in CI.
Actionable next steps for engineering teams
- Inventory ML-inference targets and classify functions by safety criticality.
- Fix and record toolchain, compiler flags, and microarchitecture settings in a manifest per ECU.
- Adopt a hybrid analysis approach: run on-target measurement campaigns while configuring a static WCET toolchain such as RocqStat to produce conservative bounds.
- Implement compositional operator contracts so model retraining or pruning does not force full-system re-analysis.
- Embed WCET checks into CI/CD and attach evidence bundles to release artifacts for auditability.
Closing: Determinism is now a first-class requirement for ML at the edge
By 2026 the combination of increased ML adoption in safety-critical domains and tighter regulatory expectations means teams must deliver not just performance but provable determinism. WCET analysis — when done with a mix of rigorous static techniques, on-target measurement, and compositional thinking — turns timing risk into traceable engineering artefacts.
Tools like RocqStat, now aligned with Vector’s verification ecosystem, make it practical to produce auditable timing guarantees for ML inference on microcontrollers and ECUs. They shorten verification cycles, help satisfy safety standards, and enable teams to safely push ML to the edge.
Call to action
If you are piloting ML on ECUs or constrained MCUs and need deterministic timing guarantees, start by creating a timing manifest for one critical path and run a hybrid WCET analysis. Contact our team at smart-labs.cloud for a hands-on workshop: we’ll help you map operator-level contracts, set up measurement campaigns, and integrate WCET checks into your CI pipeline so you can deliver auditable timing evidence for certification and production.
Related Reading
- From Micro-App to Production: CI/CD and Governance for LLM-Built Tools
- Observability in 2026: Subscription Health, ETL, and Real‑Time SLOs for Cloud Teams
- Developer Productivity and Cost Signals in 2026: Polyglot Repos, Caching and Multisite Governance
- Fact-Checking Funding: How Pharmaceutical Legal Uncertainty Should Shape Patient Advocacy Campaigns
- Where to Buy Trading Card Games at the Lowest Prices: Marketplaces Compared (Amazon, TCGplayer, eBay)
- If the Fed Loses Independence: Scenario Planning and Algorithmic Hedges
- Protecting Customer Data When Running Local AI in the Browser
- Finance Your Flip Like a Studio: Pitch Decks, IP Value, and Alternative Funding
Related Topics
Unknown
Contributor
Senior editor and content strategist. Writing about technology, design, and the future of digital media. Follow along for deep dives into the industry's moving parts.
Up Next
More stories handpicked for you
ChatGPT Translate in the Lab: Building a Multimodal Translation Microservice
Design Patterns for Agentic AI: From Qwen to Production
Building an NVLink-Enabled Inference Cluster with RISC-V Hosts
Integrating Timing Analysis into Model Compression Workflows for Embedded Devices
Operationalizing Micro-Apps at Scale: Multi-Tenant CI, Secrets Management, and Cost Controls
From Our Network
Trending stories across our publication group