Now in open beta — v0.1 OSS

AI Infrastructure
as Code.

Define your entire AI stack — models, memory, agents, pipelines, and observability — in a single declarative file. Deploy anywhere. Version everything.

Star on GitHub See how it works

aiinfra.yaml

# cloudemon.io — define your full AI stack

apiVersion: aiinfra/v1

kind: Stack

models:

- name: primary-llm

provider: anthropic

model_id: claude-sonnet-4-6

cost_limits: max_monthly_usd: 500

memory:

- name: knowledge-base

backend: pgvector # or pinecone · weaviate · qdrant

agents:

- name: support-agent

model: primary-llm

memory: [knowledge-base]

tools: [web-search, crm-lookup]

$ cloudemon apply --env production

✓ created: model:primary-llm

✓ created: vectorstore:knowledge-base

✓ created: agent:support-agent

Apply complete: 7 created · 0 errors

// how it works

From YAML to production
in minutes.

Cloudemon reads your stack definition, computes a diff against the real world, and provisions only what changed — just like Terraform, but built for AI.

Define your stack

Declare every AI resource — models, memory, agents, pipelines — in YAML or the Python SDK. Version-controlled, reviewable, repeatable.

Preview the plan

Run cloudemon plan to see exactly what will be created, updated, or destroyed — before a single API call is made.

Apply to any environment

Promote dev → staging → production with a single flag. Per-environment overrides let you use cheaper models in dev automatically.

Observe & iterate

Tracing, cost alerts, and eval regression gates are first-class primitives. Block a deploy automatically if quality regresses.

aiinfra.yaml
plan output
apply output

            # environments block — override per env

            environments:

              dev:

                overrides:

                  models:

                    primary-llm:

                      model_id: claude-haiku-4-5

              production:

                approvals_required: true

                approvers:

                  - eng-lead@acme.com

// primitives

Every AI resource,
first-class.

Cloudemon introduces a new resource model for AI-native infrastructure — purpose-built primitives that general-purpose IaC tools don't understand.

🧠

Models

Declare LLM endpoints with fallbacks, version pins, temperature, and monthly cost guardrails.

anthropic openai cohere mistral

💾

Memory

Provision vector stores and KV caches, wired automatically to your agents.

pgvector pinecone weaviate qdrant

🔧

Tools

HTTP APIs, built-in tools, and MCP servers declared as typed, versioned resources.

http builtin mcp

🤖

Agents

Orchestration config that wires models, memory, tools, and system prompts into a deployable unit.

model-ref fallback streaming

⚙️

Pipelines

Scheduled ingestion, eval regression, and batch transform jobs as first-class declared resources.

schedule on_deploy eval-gate

🚀

Serving

Auto-scaled REST, WebSocket, and gRPC endpoints with auth and CORS — provisioned on your cloud.

rest websocket autoscale

📊

Observability

Tracing, structured logging, cost alerts, and eval dashboards wired into every resource automatically.

langfuse arize cloudwatch

🌍

Environments

Per-environment overrides for models, serving scale, and cost limits. Promotion gates with approvals.

dev staging production

Plugin API

Build custom providers.
Community-extensible.

// eval gates

Block bad deploys
automatically.

Define quality thresholds in your pipeline. Cloudemon runs your eval suite before every deploy and blocks it if scores regress — no manual intervention needed.

✅

Eval-gated deployments

Set thresholds on faithfulness, relevance, and tool accuracy. A deploy that misses a threshold never reaches production.

🔁

Golden dataset integration

Point your pipeline at a JSONL eval set in S3. Cloudemon runs it on every on_deploy trigger.

🔔

Regression alerts

Notify Slack or PagerDuty when an eval score drops below baseline, even between scheduled deploys.

aiinfra.yaml — eval pipeline on_deploy

pipelines:
  - name: eval-regression
    trigger:
      type: on_deploy
    steps:
      - name: run-evals
        type: eval
        agent: support-agent
        dataset: s3://evals/golden.jsonl
        metrics:
          - name: faithfulness
            threshold: 0.85
          - name: answer_relevance
            threshold: 0.80
        on_failure: block_deploy

# ─ eval run ──────────────────────────
  ✓ faithfulness:      0.91 / 0.85
  ✓ answer_relevance:  0.88 / 0.80
  Deploy unblocked. Proceeding.

// cost control

Guard rails on
every token.

Declare monthly spend limits per model. Get Slack or email alerts before you hit your cap. Never get surprised by a runaway eval loop again.

💰

Per-model cost limits

Set a hard monthly USD cap at the model level. Cloudemon monitors token usage and fires alerts at your defined threshold.

🔀

Cheap-in-dev by default

Environment overrides let you automatically use Haiku in dev and Sonnet in production — without changing code.

📈

Cost dashboard

Token spend by model, by agent, by environment — visualized in the Cloudemon Cloud tier or exported to CloudWatch.

cost guardrails

models:
  - name: primary-llm
    provider: anthropic
    model_id: claude-sonnet-4-6
    cost_limits:
      max_monthly_usd: 500
      alert_threshold_pct: 80

observability:
  alerts:
    - name: cost-spike
      metric: hourly_token_cost_usd
      threshold: 20
      channel: slack
      webhook: ${{ secrets.SLACK_WEBHOOK }}

# ─ alert fired ─────────────────────
  ⚡ Cost alert: $400 / $500 (80%)
     Sent to #platform-alerts

// why cloudemon

Built for AI.
Not bolted on.

General-purpose IaC tools don't understand models, evals, or token costs. Cloudemon does.

Without Cloudemon

Bespoke Terraform + shell scripts per team
No version control on agent configs or prompts
Manual eval runs before each deployment
No visibility into token spend until the bill arrives
Prod deploy breaks because dev used a different model
Vector store provisioned differently across environments
New teammate takes days to understand the stack

          Single yaml file defines the entire stack
Every change tracked in Git, reviewed in PRs
Eval gates block bad deploys automatically
Cost limits and alerts on every model
Environment overrides — Haiku in dev, Sonnet in prod
One command to reproduce any env anywhere
New teammates productive in minutes, not days

        

// pricing

Start free.
Scale with your team.

The full open-source CLI is always free. Cloudemon Cloud adds managed state, a web UI, and team collaboration.

Open Source

Community

The full CLI, forever free and open source

$0 / always

Full YAML + Python SDK
All CLI commands (plan / apply / diff / destroy)
All provider plugins (AWS, GCP, Anthropic, OpenAI…)
Local state backend
Community Discord support
Apache 2.0 license

Download CLI

AI Infrastructure as Code.

From YAML to productionin minutes.

Every AI resource,first-class.

Block bad deploysautomatically.

Guard rails onevery token.

Built for AI.Not bolted on.

Start free.Scale with your team.

Your AI stack deservesversion control.