Now in open beta — v0.1 OSS

AI Infrastructure
as Code.

Define your entire AI stack — models, memory, agents, pipelines, and observability — in a single declarative file. Deploy anywhere. Version everything.

Star on GitHub See how it works
aiinfra.yaml
# cloudemon.io — define your full AI stack
apiVersion: aiinfra/v1
kind: Stack
 
models:
  - name: primary-llm
    provider: anthropic
    model_id: claude-sonnet-4-6
    cost_limits: max_monthly_usd: 500
 
memory:
  - name: knowledge-base
    backend: pgvector  # or pinecone · weaviate · qdrant
 
agents:
  - name: support-agent
    model: primary-llm
    memory: [knowledge-base]
    tools: [web-search, crm-lookup]
 
$ cloudemon apply --env production
✓ created: model:primary-llm
✓ created: vectorstore:knowledge-base
✓ created: agent:support-agent
Apply complete: 7 created · 0 errors
Works with your entire AI stack
Anthropic
OpenAI
AWS
GCP
Azure
Pinecone
Weaviate
Qdrant
LangChain
OpenAI
AI Research
Vercel
Cloud Platform
Cursor
AI Dev Tools
Databricks
Data + AI
ElevenLabs
Voice AI
Harvey AI
Legal AI
Scale AI
AI Data Platform
Notion
Productivity AI
Runway
Generative Video
Glean
Enterprise AI
Writer
Enterprise LLM
OpenAI
AI Research
Vercel
Cloud Platform
Cursor
AI Dev Tools
Databricks
Data + AI
ElevenLabs
Voice AI
Harvey AI
Legal AI
Scale AI
AI Data Platform
Notion
Productivity AI
Runway
Generative Video
Glean
Enterprise AI
Writer
Enterprise LLM
Mistral AI
Open LLMs
Linear
Dev Tooling
Together AI
Inference Cloud
R
Replicate
Model Hosting
Modal
Serverless AI
Adept AI
AI Agents
W&B
MLOps Platform
Fireworks AI
Fast Inference
Cohere
Enterprise NLP
Hugging Face
AI Community
Baseten
Model Infra
LangChain
AI Framework
Mistral AI
Open LLMs
Linear
Dev Tooling
Together AI
Inference Cloud
R
Replicate
Model Hosting
Modal
Serverless AI
Adept AI
AI Agents
W&B
MLOps Platform
Fireworks AI
Fast Inference
Cohere
Enterprise NLP
Hugging Face
AI Community
Baseten
Model Infra
LangChain
AI Framework
2.4K+
GitHub stars
340+
Teams in beta
18+
Provider plugins
99.9%
Cloud uptime SLA

From YAML to production
in minutes.

Cloudemon reads your stack definition, computes a diff against the real world, and provisions only what changed — just like Terraform, but built for AI.

01
Define your stack
Declare every AI resource — models, memory, agents, pipelines — in YAML or the Python SDK. Version-controlled, reviewable, repeatable.
02
Preview the plan
Run cloudemon plan to see exactly what will be created, updated, or destroyed — before a single API call is made.
03
Apply to any environment
Promote dev → staging → production with a single flag. Per-environment overrides let you use cheaper models in dev automatically.
04
Observe & iterate
Tracing, cost alerts, and eval regression gates are first-class primitives. Block a deploy automatically if quality regresses.
aiinfra.yaml
plan output
apply output
# environments block — override per env
environments:
  dev:
    overrides:
      models:
        primary-llm:
          model_id: claude-haiku-4-5
  production:
    approvals_required: true
    approvers:
      - eng-lead@acme.com

Every AI resource,
first-class.

Cloudemon introduces a new resource model for AI-native infrastructure — purpose-built primitives that general-purpose IaC tools don't understand.

🧠
Models
Declare LLM endpoints with fallbacks, version pins, temperature, and monthly cost guardrails.
anthropic openai cohere mistral
💾
Memory
Provision vector stores and KV caches, wired automatically to your agents.
pgvector pinecone weaviate qdrant
🔧
Tools
HTTP APIs, built-in tools, and MCP servers declared as typed, versioned resources.
http builtin mcp
🤖
Agents
Orchestration config that wires models, memory, tools, and system prompts into a deployable unit.
model-ref fallback streaming
⚙️
Pipelines
Scheduled ingestion, eval regression, and batch transform jobs as first-class declared resources.
schedule on_deploy eval-gate
🚀
Serving
Auto-scaled REST, WebSocket, and gRPC endpoints with auth and CORS — provisioned on your cloud.
rest websocket autoscale
📊
Observability
Tracing, structured logging, cost alerts, and eval dashboards wired into every resource automatically.
langfuse arize cloudwatch
🌍
Environments
Per-environment overrides for models, serving scale, and cost limits. Promotion gates with approvals.
dev staging production
+
Plugin API
Build custom providers.
Community-extensible.

Block bad deploys
automatically.

Define quality thresholds in your pipeline. Cloudemon runs your eval suite before every deploy and blocks it if scores regress — no manual intervention needed.

Eval-gated deployments
Set thresholds on faithfulness, relevance, and tool accuracy. A deploy that misses a threshold never reaches production.
🔁
Golden dataset integration
Point your pipeline at a JSONL eval set in S3. Cloudemon runs it on every on_deploy trigger.
🔔
Regression alerts
Notify Slack or PagerDuty when an eval score drops below baseline, even between scheduled deploys.
aiinfra.yaml — eval pipeline on_deploy
pipelines:
  - name: eval-regression
    trigger:
      type: on_deploy
    steps:
      - name: run-evals
        type: eval
        agent: support-agent
        dataset: s3://evals/golden.jsonl
        metrics:
          - name: faithfulness
            threshold: 0.85
          - name: answer_relevance
            threshold: 0.80
        on_failure: block_deploy

# ─ eval run ──────────────────────────
  ✓ faithfulness:      0.91 / 0.85
  ✓ answer_relevance:  0.88 / 0.80
  Deploy unblocked. Proceeding.
          

Guard rails on
every token.

Declare monthly spend limits per model. Get Slack or email alerts before you hit your cap. Never get surprised by a runaway eval loop again.

💰
Per-model cost limits
Set a hard monthly USD cap at the model level. Cloudemon monitors token usage and fires alerts at your defined threshold.
🔀
Cheap-in-dev by default
Environment overrides let you automatically use Haiku in dev and Sonnet in production — without changing code.
📈
Cost dashboard
Token spend by model, by agent, by environment — visualized in the Cloudemon Cloud tier or exported to CloudWatch.
cost guardrails
models:
  - name: primary-llm
    provider: anthropic
    model_id: claude-sonnet-4-6
    cost_limits:
      max_monthly_usd: 500
      alert_threshold_pct: 80

observability:
  alerts:
    - name: cost-spike
      metric: hourly_token_cost_usd
      threshold: 20
      channel: slack
      webhook: ${{ secrets.SLACK_WEBHOOK }}

# ─ alert fired ─────────────────────
  ⚡ Cost alert: $400 / $500 (80%)
     Sent to #platform-alerts
          

Built for AI.
Not bolted on.

General-purpose IaC tools don't understand models, evals, or token costs. Cloudemon does.

Without Cloudemon
  • Bespoke Terraform + shell scripts per team
  • No version control on agent configs or prompts
  • Manual eval runs before each deployment
  • No visibility into token spend until the bill arrives
  • Prod deploy breaks because dev used a different model
  • Vector store provisioned differently across environments
  • New teammate takes days to understand the stack
  • Single yaml file defines the entire stack
  • Every change tracked in Git, reviewed in PRs
  • Eval gates block bad deploys automatically
  • Cost limits and alerts on every model
  • Environment overrides — Haiku in dev, Sonnet in prod
  • One command to reproduce any env anywhere
  • New teammates productive in minutes, not days

Start free.
Scale with your team.

The full open-source CLI is always free. Cloudemon Cloud adds managed state, a web UI, and team collaboration.

Open Source
Community
The full CLI, forever free and open source
$0 / always
  • Full YAML + Python SDK
  • All CLI commands (plan / apply / diff / destroy)
  • All provider plugins (AWS, GCP, Anthropic, OpenAI…)
  • Local state backend
  • Community Discord support
  • Apache 2.0 license
Download CLI
Enterprise
Enterprise
Private cloud, RBAC, audit logs, SLA
Custom
  • Everything in Teams
  • Self-hosted deployment option
  • RBAC + fine-grained permissions
  • Full audit log export
  • Dedicated Slack channel
  • 99.9% uptime SLA
  • Custom provider development
Contact sales →

Your AI stack deserves
version control.

Join hundreds of teams already managing their AI infrastructure as code.

Apache 2.0 open source
YAML + Python + TypeScript SDKs
Works with any cloud
No vendor lock-in