Skip to content

Future Plans

This document showcases advanced workflow features that are planned for future releases. These enhancements will significantly expand Stavily’s automation capabilities, enabling more complex, resilient, and observable workflows.

The example below demonstrates several upcoming features working together in a comprehensive CPU remediation workflow with escalation and recovery capabilities.

name: "Advanced CPU Remediation with Escalation and Recovery"
description: "Comprehensive remediation workflow featuring looping, approvals, compensation, and observability"
version: "2.0.0"
created_by: "stavily-system"
tags: ["remediation", "cpu", "ops", "monitoring"]
metadata:
revision: 3
environment: "production"
category: "infrastructure"
ui:
color: "#4B9EFF"
icon: "cpu"
agents:
pools:
- pool_name: "prod-servers"
agent_regex: "prod-.*"
auto_install_plugins: true
on_plugin_missing: "skip_agent"
single_agent: "fallback-agent-001"
secrets:
- name: "ssh_key"
type: "vault"
ref: "vault:infra/ssh_ops_key"
- name: "email_smtp_creds"
type: "env"
ref: "SMTP_CREDS"
variables:
cpu_threshold: 90
alert_recipients: ["[email protected]", "[email protected]"]
triggers:
agents:
single_agent: "trigger-agent"
steps:
- name: "cpu-alert"
plugin: "prometheus-trigger-v1.1.0"
config:
metric: "cpu_usage_percent"
condition: "> {{ variables.cpu_threshold }}"
duration: "2m"
- name: "manual-trigger"
plugin: "manual-approval-trigger"
config:
message: "Run manually for testing or escalation"
enabled: false
actions:
agents:
pools:
- pool_name: "actions-pool"
agent_regex: "actions-.*"
steps:
- name: "analyze-processes"
plugin: "python-script-action-v2.1.0"
config:
script: "analyze_cpu_processes.py"
timeout: "30s"
error_handling:
on_failure: "continue"
max_retries: 3
retry_delay: "15s"
- name: "analyze-all-servers"
plugin: "remote-analyzer-v3.0.0"
for_each: "{{ agents.pools[0].agent_regex }}"
parallel: true
config:
script: "remote_cpu_check.py"
- name: "approval"
plugin: "manual-approval-action-v1.0.0"
depends_on: ["analyze-processes"]
type: "manual"
config:
approvers: ["[email protected]"]
message: "CPU remediation requires approval to kill processes."
- name: "kill-high-cpu-processes"
plugin: "system-command-action-v1.2.0"
depends_on: ["approval"]
condition: "analyze-processes.output.high_cpu_count > 3"
config:
command: "pkill -f high_cpu_process"
timeout: "10s"
error_handling:
on_failure: "invoke-compensation"
- name: "restart-service"
plugin: "service-management-action-v1.5.0"
depends_on: ["kill-high-cpu-processes"]
condition: "kill-high-cpu-processes.status == 'success'"
config:
service: "webapp"
action: "restart"
- name: "subworkflow-escalation"
plugin: "subworkflow-caller-v1.0.0"
depends_on: ["restart-service"]
condition: "restart-service.status == 'failed'"
config:
workflow_id: "incident-escalation-v2"
parameters:
severity: "high"
category: "cpu"
compensation:
- name: "rollback-service"
plugin: "service-management-action-v1.5.0"
config:
service: "webapp"
action: "rollback"
outputs:
agents:
single_agent: "output-agent"
steps:
- name: "send-summary"
plugin: "formatted-email-output-v1.5.0"
depends_on: ["restart-service"]
config:
recipients: "{{ variables.alert_recipients }}"
template: "cpu_remediation_summary"
attachments:
- path: "logs/workflow-{{ workflow.id }}.txt"
monitoring:
timeout: "15m"
heartbeat_interval: "30s"
progress_tracking: true
log_level: "debug"
metrics:
enabled: true
export_to: "prometheus"
observability:
tracing: true
record_logs: true
ui_annotations:
highlight_steps: ["kill-high-cpu-processes", "subworkflow-escalation"]
permissions:
owner: "devops-team"
allow_run_by: ["ops", "admins"]
read_only_for: ["auditors"]
graph TD
    A[Workflow Trigger] --> B{Parallel Execution Engine}
    B --> C[For-Each Iterator]
    B --> D[Sequential Steps]

    C --> E[Parallel Step 1]
    C --> F[Parallel Step 2]
    C --> G[Parallel Step N]

    D --> H[Sequential Step 1]
    D --> I[Sequential Step 2]

    E --> J{Error Check}
    F --> J
    G --> J
    H --> K{Condition Check}
    I --> K

    J -->|Failure| L[Compensation Handler]
    K -->|Failure| L

    L --> M[Rollback Actions]
    M --> N[Recovery Workflow]

    J -->|Success| O[Continue]
    K -->|Success| O

    O --> P[Observability Layer]
    P --> Q[Metrics Export]
    P --> R[Tracing]
    P --> S[UI Annotations]

For-Each Loops

The for_each directive enables iteration over collections, executing steps multiple times with different parameters.

Example: for_each: "{{ agents.pools[0].agent_regex }}" - runs the step once for each matching agent.

Benefits:

  • Process multiple items concurrently
  • Dynamic scaling based on agent pools
  • Reduced execution time for batch operations

Parallel Execution

The parallel: true flag allows steps to run concurrently, significantly reducing total execution time for independent operations.

Use Cases:

  • Multi-server deployments
  • Independent validation checks
  • Parallel data processing tasks
# Example: Parallel server analysis
- name: "analyze-all-servers"
plugin: "remote-analyzer-v3.0.0"
for_each: "{{ agents.pools[0].agent_regex }}"
parallel: true
config:
script: "remote_cpu_check.py"
CategoryFeatureStatusDescription / Implementation Notes
Core DSLWorkflow metadata (id, name, description, version)Already implemented — clear and semantic.
Agent definitions (workflow/type/plugin level)Fully implemented and flexible with regex, pools, single agent.
Conditional execution (condition, depends_on)Present and working.
Error handling (on_failure, retries)Already defined with retry logic.
Plugin architecture (versioned plugins)Excellent — version pinning adds consistency.
Advanced ExecutionLoops / For-Each🧩 In ProgressExample included (for_each). Needs parser and scheduler support for concurrent runs.
Parallel Execution (parallel: true)🧩 In ProgressNeeds execution engine concurrency support.
Sub-workflows / Nested workflows🧩 In ProgressDSL supports via subworkflow-caller — backend orchestration logic to implement.
Manual Approval StepImplemented as manual-approval-action.
Compensation / Rollback🧩 In ProgressDSL supports compensation block; needs engine hooks.
Data / ContextVariable & templating systemImplemented ({{ variables.* }} syntax).
Outputs referencing & propagationAlready supported (e.g. analyze-processes.output.*).
Global and step-level environment variables🚧 To DoShould add .env injection or env: per step.
Type-safe variables / schema🚧 To DoWould improve validation and autocompletion.
Secrets / SecuritySecrets injection from vault/env🧩 In ProgressDSL design defined; need integration with secret backends.
Permissions & RBAC per workflow🧩 In ProgressDSL supports via permissions block; requires enforcement logic.
Signed workflows / integrity check🚧 To DoCould add checksum or digital signature for verified deployment.
ObservabilityMonitoring (timeout, heartbeat, progress)Already implemented.
Structured logging & tracing🧩 In ProgressPartially in observability — needs backend support.
Metrics export (Prometheus)🧩 In ProgressDSL supports it; exporter to be implemented.
UsabilityTags, metadata, UI hintsPresent (metadata, ui_annotations).
Validation & static analysis🚧 To DoYAML linter / validator should check for circular dependencies.
Visual DAG support🚧 To DoAuto-generate DAG visualization from dependencies.
DevOps / PlatformVersion control / rollback🧩 In ProgressVersion field exists; implement history & rollback mechanism.
Plugin marketplace / registry🚧 To DoBackend feature for discoverability and updates.
Cron / schedule triggers🚧 To DoExtend triggers with cron or time-based scheduling.