SuperFlow is built for mission-critical workloads. Every step in a running SuperFlow is journaled to durable storage, every successful step runs exactly once, and every in-flight run survives crashes, restarts, and deployments without losing work or replaying side effects. This page explains the guarantees SuperFlow makes, what they mean in practice, and how to configure them.Documentation Index
Fetch the complete documentation index at: https://docs.lyzr.ai/llms.txt
Use this file to discover all available pages before exploring further.
The guarantee in one line
Once a step in your SuperFlow completes successfully, it will never run again — even if the service restarts mid-flow.That’s the property that makes SuperFlow safe for billing pipelines, customer-facing automations, multi-day approval flows, and anything else where you can’t afford to lose work or double-charge.
Why durability matters
Without a durable engine, a workflow that crashes halfway through has to either restart from scratch (re-running every step, double-charging customers, re-sending emails) or be manually nursed back to a consistent state (slow, error-prone, doesn’t scale). SuperFlow eliminates the choice. The engine writes a journal entry for every meaningful step — every LLM call, every tool call, every HTTP request, every code execution, every loop iteration. If the service is restarted while a SuperFlow is running:- The run resumes from where it left off.
- Already-completed steps do not re-execute — their journaled outputs are used directly.
- Only the in-flight step (and any that come after it) actually run.
What’s durable
The full step-by-step lifecycle is captured:- LLM calls — every model invocation is journaled. On replay, the recorded response is reused; the model is not called again.
- Tool calls — every tool invocation. Idempotent or not, a tool that already ran successfully won’t be re-fired.
- HTTP requests — request and response are captured. No double-POST on retries.
- Code nodes — JavaScript execution results are journaled.
- Pre- and post-processing — guardrails, memory writes, knowledge-base queries — each step recorded.
- Loop iterations — every iteration of a Loop body is recorded individually. A 1000-item loop that crashes on item 743 resumes at 743, not at 1.
- Sub-workflow calls — Execute Workflow runs are durably tracked.
- Approval waits — HITL approvals can wait days or weeks without consuming resources or losing state.
- Scheduled triggers — cron schedules run on crash-safe timers (more below).
Exactly-once side effects
The combination of “step result is journaled before being returned” and “completed steps are never re-executed” gives you exactly-once execution of each step from the engine’s point of view. In plain terms:- A customer never gets charged twice because the workflow restarted.
- An email never gets sent twice because the LLM call after it failed.
- A downstream API never sees the same POST body twice in a single run.
Per-node retry on failure
For transient failures — a flaky external API, a model timeout, a momentary network blip — every retryable node can be configured with automatic retries. In a node’s configuration drawer, expand the Retry section and toggle Retry on failure on. Then set:- Max attempts — how many times the engine will retry before giving up.
- Wait between attempts — backoff delay in milliseconds.
Long waits cost nothing
Because state lives in durable storage and not in a hot-running process, a SuperFlow that’s waiting:- For a human to approve something
- For a scheduled trigger to fire
- For a Wait node to elapse
Lifecycle controls are durable too
The Pause, Resume, and Terminate buttons on a running SuperFlow operate against durable state:- Pause records the request in durable storage. The run stops at the next inter-node boundary and stays paused even if the service restarts.
- Resume picks the run up where it stopped — completed nodes stay completed, the next pending node starts fresh.
- Terminate cancels the run definitively and marks it cancelled in history. Any in-flight step is cancelled; nothing else runs.
Crash-safe scheduled triggers
Cron-based schedules don’t rely on a “scheduler process” that has to stay up. Each schedule lives in durable storage and uses delayed self-sends — the schedule queues its next tick at creation, and that queued tick survives every kind of outage. What this means for you:- A schedule will not “miss a tick” because the service was down at the scheduled moment. When the runtime comes back, queued ticks fire as soon as they’re due.
Replay every past run
Every run that has ever executed — successful, failed, paused, cancelled — is preserved with its full per-node output. Open the History drawer in the editor to:- Inspect what every node received and emitted on any past run.
- Re-run the exact same input through the latest version of the SuperFlow.
- Re-run from a specific node to retry only the tail of a failed run (without re-paying for the steps that already succeeded).
Observability
In addition to per-node JSON outputs in History, SuperFlow emits OpenTelemetry traces for every run end-to-end:- The HTTP request that triggered the run is the root span.
- Each node, each LLM call, each tool call, each pre/post-processing module gets its own span.
- Token counts, model names, providers, durations, and costs are attached as span attributes.
Mission-critical checklist
If you’re putting a SuperFlow on a production path, here’s a short checklist:- Configure retries on every node that calls an external system (HTTP Request, AI Agent, Tool nodes). 3 attempts with exponential backoff is a reasonable default.
- Use Wait for Approval for any step with irreversible real-world side effects (payments, customer-facing emails, deletions). Pair it with a clear approval message that contains the relevant upstream context.
- Set retry attempts generously on nodes that talk to flaky systems — you can’t branch around a permanent failure, so retries are your last line of defense before the run errors out.
- Lock down webhook secrets. If your trigger is a webhook, rotate the secret on team changes and never commit it to source.
- Watch executions in History while rolling out a new SuperFlow. Live monitoring + replay together catch issues fast.
- Schedule with a buffer. For cron-based workloads, leave room for retries and downstream slowness so a slow run doesn’t overlap with the next tick.
What this enables
The combined result is that you can build SuperFlows that:- Pay invoices.
- Send customer-facing communication.
- Run for days or weeks waiting on humans.
- Survive restarts, deployments, and infrastructure events without operator intervention.
- Are observable end-to-end without setting up tracing yourself.