verificationCI/CDembedded

Continuous Verification for Safety-Critical ML: Integrating RocqStat into CI for WCET and Timing Analysis

ssmart labs

2026-02-07

11 min read

Tutorial: integrate RocqStat/VectorCAST into CI to automate WCET checks for embedded ML inference and enforce timing gates in PRs.

Hook: Stop shipping unverified timing for safety‑critical embedded ML

If your team is iterating on embedded ML models and agonizing over late-stage timing surprises, you’re not alone. Modern inference stacks — TinyML on MCUs, quantized CNNs on MCUs with NPUs, and real‑time perception loops — add complexity that breaks assumptions about latency. The fix: integrate WCET and timing verification into CI so every commit is automatically checked for worst‑case execution time regressions.

Why this matters in 2026: timing verification is mandatory for safety and scale

In late 2025 and early 2026 we’ve seen two converging trends: safety standards and automotive/industrial customers demanding stronger timing evidence, and tool consolidation that makes automation feasible. Vector Informatik’s acquisition of StatInf’s RocqStat (announced January 2026) signals vendor consolidation: Vector plans to unify RocqStat into the VectorCAST toolchain to combine test automation with advanced timing analysis.

“Timing safety is becoming a critical …” — Vector announcement, Jan 2026

That means you can expect tighter toolchain integration and better CI support for automated WCET checks in the coming year. For teams building safety‑critical embedded ML, this is an opportunity: embed timing verification directly into DevOps and MLOps pipelines and catch regressions earlier. If you’re designing for low-latency testbeds or edge CI, see patterns in Edge Containers & Low-Latency Architectures for Cloud Testbeds to reduce noise in timing runs.

Overview: What we’ll build in this tutorial

This tutorial walks you through a practical CI automation pattern that performs WCET checks for embedded ML inference using RocqStat (now part of VectorCAST). You’ll learn how to:

Design a CI job flow: build → test → VectorCAST unit tests → RocqStat static WCET analysis → enforce timing gates.
Containerize and pin the environment for reproducible timing analysis.
Integrate with GitHub Actions/GitLab CI/Jenkins and handle licensing securely.
Parse and enforce WCET thresholds automatically and create actionable reports.

High‑level pipeline pattern

Make the WCET check a first‑class CI gate. At minimum, your pipeline should run these stages for any PR that modifies runtime code or model artifacts:

Checkout + install pinned cross toolchain.
Build firmware and generate inference binary (same flags as release).
Run VectorCAST unit tests (API/host tests or target tests via hardware-in-the-loop).
Invoke RocqStat timing analysis against the compiled binary (or object files) and target CPU model.
Parse RocqStat output and compare the WCET against a threshold or SLA.
Fail the job and annotate the PR if the analysis shows a regression or a violation.

Prerequisites and assumptions

You have VectorCAST and RocqStat licenses (RocqStat capabilities are being integrated into VectorCAST following the 2026 acquisition).
Your codebase cross‑compiles reproducibly and uses a pinned toolchain (GCC/toolchain version, linker scripts, and build flags recorded).
You maintain a target CPU model in RocqStat (or VectorCAST), including cache/branch predictor settings that match the target hardware or QEMU model.
CI runners can access license servers (use secure network design) or you use containerized images with licensed tools on self‑hosted runners. If you need help deciding where to run licensed tooling, see the On‑Prem vs Cloud decision matrix for guidance on self-hosted runners versus hosted CI trade-offs.

Step 1 — Containerize and pin the verification environment

Reproducibility is everything for timing. Differences in compiler optimization flags, library versions, or verification tool versions produce different WCETs. Use Docker (or OCI images) to pin the environment: refer to container pattern guidance in Edge Containers & Low-Latency Architectures when you design small, deterministic images.

FROM ubuntu:22.04

# Install cross-toolchain (pinned)
RUN apt-get update && apt-get install -y gcc-arm-none-eabi=15:12.2-1

# Install VectorCAST/RocqStat CLI runtime and dependencies
# (licensed, replaced with your vendor packaging step)
COPY vectorcast_cli /opt/vectorcast_cli
ENV PATH="/opt/vectorcast_cli/bin:$PATH"

# Add the project and set workspace
WORKDIR /workspace

Keep a small image that includes only the build and verification runtime. For licensed commercial tools like VectorCAST/RocqStat you typically install a CLI package and point it to a license server or checkout tokens. Don’t bake private license files into images — use CI secrets to provide them at runtime. For governance and tool consolidation best practices, review the Tool Sprawl Audit.

Step 2 — Build reproducibly (same flags as release)

WCET depends critically on compiler flags and link layout. Make your CI build use the exact flags used for the target release (binary layout affects cache, branch alignment, and consequently WCET). Example Makefile excerpt:

CFLAGS += -O2 -fno-exceptions -fdata-sections -ffunction-sections
LDFLAGS += -Wl,--gc-sections -Ttarget_flash.ld

Store build metadata: commit SHA, toolchain version, and build flags in an artifact consumed by the timing analysis to guarantee traceability. For teams concerned with artifact storage and caching of build outputs, edge caching and artifact appliances like the ByteCache Edge Appliance can help reduce CI variability and speed up reproducible builds.

Step 3 — Run VectorCAST unit tests as part of the pipeline

VectorCAST provides automated unit and integration testing; run these tests before timing analysis to ensure functional regressions aren’t mistaken for timing issues. A typical CLI invocation (vendor-specific) looks like:

# create a VectorCAST project and run tests via CLI
vectorcast create_project --name my_project --workspace /workspace
vectorcast build --project my_project --toolchain arm-none-eabi
vectorcast run_tests --project my_project --target-model qemu_riscv

This both validates the functional behavior and populates coverage artifacts that can be fed into timing analysis workflows.

Step 4 — Configure and run RocqStat WCET analysis

RocqStat provides static and measurement‑aided estimation of WCET by exploring control‑flow, microarchitectural behavior (cache, pipeline), and path constraints. After the Vector acquisition the product is now exposed via VectorCAST tool integration, allowing programmatic invocation from CI.

Key inputs to RocqStat:

Binary or object files for the inference target.
CPU/microarchitecture model: pipeline stages, cache sizes, timing for NPU accelerators (if modeling offloaded execution).
CFG and source correlation (map files, symbol tables).
Assumptions about environmental stimuli: worst-case arrival times, DMA interference, interrupt behavior.

Example command (pseudo-CLI; vendor-specific parameters will vary):

# run RocqStat WCET analysis via VectorCAST integration
vectorcast rocqstat analyze \
  --project my_project \
  --binary build/inference.elf \
  --cpu-model models/arm_cortex_m7.json \
  --output reports/wcet_report.xml

RocqStat produces machine-readable outputs (XML/JSON) and human-friendly reports. Store both as build artifacts.

Step 5 — Enforce timing gates automatically

Make WCET checks deterministic: compare the reported WCET to a predefined SLA and fail the build if the result exceeds the allowed budget. Keep thresholds per target and per execution path (e.g., worst-case for perception pipeline, median tail for soft real-time tasks).

Example minimal Python check that parses a RocqStat XML report and fails the job if WCET exceeds limit:

#!/usr/bin/env python3
import xml.etree.ElementTree as ET
import sys

THRESHOLD_MS = 10.0  # worst-case budget in milliseconds

tree = ET.parse('reports/wcet_report.xml')
root = tree.getroot()
# Example XML path depends on tool; adapt to actual RocqStat schema
wcet_elem = root.find('.//WCET')
if wcet_elem is None:
    print('WCET element not found in report', file=sys.stderr)
    sys.exit(2)

wcet_ms = float(wcet_elem.text)
print(f'WCET = {wcet_ms} ms, threshold = {THRESHOLD_MS} ms')
if wcet_ms > THRESHOLD_MS:
    print('WCET threshold violated')
    sys.exit(1)

print('WCET check passed')

Integrate this check into CI. You can map results to JUnit XML or GitHub annotations for quick triage in PRs.

CI Example: GitHub Actions workflow

Below is a condensed GitHub Actions example demonstrating the flow. Production workflows should use self‑hosted runners with access to licenses or a secure license proxy.

name: wcet-check

on:
  pull_request:
    paths:
      - 'src/**'
      - 'models/**'

jobs:
  wcet:
    runs-on: self-hosted
    steps:
      - uses: actions/checkout@v4
      - name: Build Docker image
        run: docker build -t my/verif-env:1.0 .
      - name: Build firmware
        run: |
          docker run --rm -v ${{ github.workspace }}:/workspace my/verif-env:1.0 make all
      - name: Run VectorCAST unit tests
        run: |
          docker run --rm -v ${{ github.workspace }}:/workspace my/verif-env:1.0 \
            vectorcast run_tests --project my_project
        env:
          VCAST_LICENSE: ${{ secrets.VCAST_LICENSE }}
      - name: Run RocqStat WCET analysis
        run: |
          docker run --rm -v ${{ github.workspace }}:/workspace my/verif-env:1.0 \
            vectorcast rocqstat analyze --project my_project --binary /workspace/build/inference.elf --output /workspace/reports/wcet_report.xml
        env:
          ROCQSTAT_TOKEN: ${{ secrets.ROCQSTAT_TOKEN }}
      - name: Enforce WCET threshold
        run: docker run --rm -v ${{ github.workspace }}:/workspace my/verif-env:1.0 python3 /workspace/ci/check_wcet.py

Notes:

Use self-hosted runners for licensed tools if your vendor license cannot be containerized on GitHub-hosted runners. Guidance on choosing self-hosted vs hosted CI is summarized in On‑Prem vs Cloud for Fulfillment Systems: A Decision Matrix.
Store license tokens securely in your CI secret store and mount them as environment variables or via secure files.

Dealing with hardware-specific models and NPUs

Embedded ML often executes on specialized accelerators or shared buses. RocqStat’s modeling must account for offloads (e.g., NPU invocation latency, DMA contention). Two recommended approaches:

Hybrid analysis: Use static WCET for host CPU code and measured worst-case for accelerator kernels (microbenchmarked worst-case on representative hardware). Feed measured kernel maxima into RocqStat as atomic block execution times.
Abstract modeling: Model the NPU as an abstract device with conservative service times and model contention on the bus or memory subsystem.

Both approaches can be automated: include microbenchmarks and measurement artifacts in CI, then merge those numbers into the RocqStat run before enforcement. For teams integrating edge AI accelerators into production stacks, see higher-level operational patterns in Disruption Management in 2026: Edge AI and Real-Time Ancillaries.

Advanced strategies: regression tracking, per-path budgets, and MLOps hooks

Once you make WCET checks part of CI, evolve the system to provide real operational feedback:

Regression dashboards: Persist WCET results per-commit in a time-series DB (Prometheus + Grafana or Elastic). Track regressions and compute slope for early warnings. For operational audit and decision planes, review Edge Auditability & Decision Planes.
Per-path budgets: Use RocqStat path reports to set budgets per critical path (e.g., sensor preprocessing, inference, post-processing). Gate PRs only for critical paths to reduce false positives.
MLOps integration: Tie model CI to timing checks: if a new quantized model increases WCET beyond a threshold, block promotion to staging and trigger an automated retrain/quantization step or flag for model optimization. These model-aware workflows align with the Edge‑First Developer Experience approach to integrating models and runtime constraints into CI.

Security, licensing, and reproducibility considerations

Security and traceability are crucial for safety certification and audits. Follow these best practices:

Use signed artifacts for binaries and reports to ensure provenance.
Keep verification runners in an isolated, auditable network segment for license servers.
Store tool versions, build flags, and CPU model metadata with every analysis report for reproducibility in audits (ISO 26262, DO-178C). For regulatory and due-diligence approaches tailored to microfactories and regulated creators, see Regulatory Due Diligence for Microfactories and Creator-Led Commerce.
Rotate license tokens and limit access via CI role-based access control. Never hardcode license files in repos or public images.

Common pitfalls and how to avoid them

Mismatch between CI model and hardware: Ensure the CPU model in RocqStat matches the real device microarchitecture. Small differences in cache size or pipeline depth can change WCET estimates.
Non-reproducible builds: Pin compilers and toolchains; record build metadata and use container images for CI jobs.
Ignoring measurement-based evidence: Purely static analysis can be conservative; combine it with measured worst-case kernel latencies when modeling accelerators.
Overly broad gating: Gate only on critical paths to avoid blocking developers for non-critical regressions. Use severity levels. For thinking about tool governance and reducing sprawl of verification tooling, see Tool Sprawl Audit.

Case study (hypothetical): automotive perception pipeline

Context: an automotive Tier‑1 integrates a quantized CNN for pedestrian detection on a Cortex‑M7 + NPU. After adding a new post‑processing step, perception occasionally misses a real‑time deadline in worst‑case scenarios.

Solution implemented:

Added a CI workflow that cross‑builds the firmware and runs VectorCAST unit tests.
Configured RocqStat with a composite model: Cortex‑M7 pipeline + NPU invocation block modeled with measured worst‑case latency from a microbenchmark harness.
Set per-path WCET budgets: sensor preprocessing 2 ms, inference end‑to‑end 15 ms.
Enabled automatic PR annotation with the WCET delta and blocked merges when a budget is exceeded.

Outcome: regressions were caught at PR time, developer feedback accelerated, and the project produced traceable timing evidence for ASIL‑D work items.

Future predictions for 2026 and beyond

Based on recent tool consolidation (Vector + RocqStat) and industry momentum, expect these trends in 2026:

CI-native timing verification: Verification vendors will ship more robust CLI/REST APIs to run timing analysis in automated pipelines and in cloud CI environments. Patterns for low-latency, CI-native verification will follow the edge container and testbed trends covered in Edge Containers & Low-Latency Architectures.
Model-aware WCET: Tools will include better support for ML kernels and quantized operators, enabling per-op timing models that can be composed.
Regulatory codification: Timing evidence and toolchain traceability will be more explicitly required in safety certification documents. For practical steps to prepare audit artifacts, see Regulatory Due Diligence.
Standardized artifacts: Expect standard schemas for WCET reports (JSON/XML) so CI systems can consume and visualize timing metrics consistently across vendors.

Actionable takeaways

Start small: Add a timed CI job for a single critical path and expand coverage iteratively. If you need to choose where to run those jobs, the On‑Prem vs Cloud decision matrix helps weigh trade-offs.
Pin everything: Containerize tool versions, compiler toolchains, and CPU models for reproducible results. See edge container patterns for stable base images and fast CI runs.
Automate enforcement: Fail CI on WCET violations and provide clear PR annotations with deltas and report links.
Combine static + measured: For NPUs or accelerators, use microbenchmarks to supply conservative execution times to static analysis.
Track trends: Store WCET results per commit and visualize regressions to catch slow drifts before they become failures. For auditability and decision-plane designs, reference Edge Auditability & Decision Planes.

Quick checklist for implementation

Do you have vendor licenses and a plan for CI runner access?
Are your builds deterministic and reproducible (pinned toolchain)?
Do you have a CPU/microarchitecture model for RocqStat and any accelerator latency measurements?
Is your CI configured to store and expose WCET reports as artifacts and PR annotations? Consider artifact caching appliances like ByteCache if your artifacts are large.
Are WCET thresholds documented and versioned alongside code?

Final notes: getting buy‑in from teams and auditors

Integrating timing verification into CI shifts the team culture toward continuous safety. Start by demonstrating value on a single critical path, then generalize. Provide auditors with reproducible artifacts: build metadata, tool versions, CPU models, and signed WCET reports. The Vector + RocqStat trajectory makes it simpler to produce traceable, vendor-supported timing evidence for safety regulators. For program-level governance and audit readiness, consider the approaches in Regulatory Due Diligence.

Call to action

If you’re evaluating vendor options or need a turn‑key CI pattern for WCET checks on embedded ML, smart-labs.cloud has ready‑to-deploy templates, Docker images, and CI workflows tuned for VectorCAST + RocqStat integration. Contact us for a demo, get our sample repo with GitHub Actions and parsing scripts, or request a consulting workshop to instrument your inference pipeline for continuous timing verification.

smart labs

Contributor

Senior editor and content strategist. Writing about technology, design, and the future of digital media. Follow along for deep dives into the industry's moving parts.