Integration Runtime

Designing safe production workflows for integrations.

Type

Conceptual

Role

Product Designer

Duration

2 Weeks

Focus

Platform UX

  • Product design ⦿ Platform UX ⦿ Reliability systems ⦿ Failure recovery ⦿ SAAS

What's this about

Integrations don't fail loudly. They degrade quietly.

Auditing tools like Zapier, Workato, and Make revealed the same pattern — failures buried in logs or collapsed into a generic "something went wrong" state. Admins had data but no way to reason about it under pressure.

The real risk: when admins can't diagnose quickly, their first instinct is to edit configuration. That's the most common way a single failure becomes three.

What I tried first

Notification model

Rejected

The problem

Surface failures as notifications with severity levels. Familiar, low-friction. The problem: notifications are optimised for awareness, not action. Admins still had to navigate to config to understand the situation. Under pressure, that extra step is where mistakes happen.

The Shift

Before

After

Failures buried in logs or raw traces

Failures as structured, diagnosable objects

Generic error states with no context

Execution state always visible and current

Recovery and config on the same surface

Recovery scoped to the active issue only

Admins guessing what's safe to touch

Safe actions determined by system state

Design Decisions

  1. State as the entry point

Surface execution state first — Running, Degraded, Paused, Rate-limited. State governs which actions appear. Unsafe options are never shown, not just disabled.

Cost: Removed flexibility some admins want — like force-resuming a paused integration. Confidence matters more than speed in a failure scenario.
  1. Failures as structured objects

Each failure surfaces: what failed, why, what's affected, and whether data loss occurred. No log access required to understand the incident.

Cost: Structuring failures meant making assumptions about failure types. Novel or compound failures may not fit cleanly — a known gap that needs a fallback in production.
  1. Constrained recovery paths

Recovery flows are issue-specific. Reconnecting auth doesn't expose trigger config. Retrying events doesn't allow editing actions. Each path addresses exactly one failure.

Cost: Power users wanted more control. The constraint held — most errors during incidents come from over-intervention, not under-action.
  1. Configuration read-only by default

Production config defaults to inspection. An explicit action is required to enter edit mode, visually separated from all recovery flows.

Cost: Adds one extra step for legitimate config changes. A small cost that creates a clear break between fixing a failure and changing how the system works.

outcomes

No guesswork

Execution state and failure cause visible before any action is taken

Scoped recovery

Each fix flow addresses one issue — config stays untouched

Auto contained

Repeated failures trigger system pause before data is at risk

Full audit

Every action logged immutably — system and human

The hard part

The most impactful decisions weren't about what to add — they were about what to remove. Every option that felt helpful in a calm moment became a liability the moment pressure was high.

Learnings

Remove options, don't just disable them

Hidden unsafe actions still create cognitive load. If it's not safe, it shouldn't exist on the surface at all.

State is a better primitive than events

Events tell you something happened. State tells you what's true right now. Design for state.

Constrained paths build confidence

Admins made fewer mistakes when choices were scoped. Less freedom in a crisis is a feature, not a limitation.

Friction is sometimes the right answer

The config edit step adds one click. That one click prevents the most common incident escalation pattern.

WHAT I’D IMPROVE

This works for known failures. It breaks with multiple ones. When everything goes wrong at once, there’s no single path forward. I’d add a fallback: pause everything and make the system safe by default

See what else I built

Working on a complex product? I can help bring clarity to it.




I take on projects, part-time work, and full-time roles. 

send the details — we’ll figure it out.

vizuraja@gmail.com

CHAOS

CLARITY

Linkedin

Instagram

Twitter

V

I

S

W

A

You could have been anywhere on the internet, yet you’re here. thanks for visiting

Create a free website with Framer, the website builder loved by startups, designers and agencies.