> ## Documentation Index
> Fetch the complete documentation index at: https://docs.aicord.cloud/llms.txt
> Use this file to discover all available pages before exploring further.

# Operations runbook

# <img src="https://mintcdn.com/lipedevv/MuqSig17BRwFcVpd/icons/runbook.svg?fit=max&auto=format&n=MuqSig17BRwFcVpd&q=85&s=3a878b4f3453feaff3e16e39c72437ac" alt="Runbook icon" width="22" data-path="icons/runbook.svg" /> Operations Runbook

Day-to-day operation and incident handling for AiCordCloud.

## Service lifecycle

* Start: process manager command
* Restart: rolling restart when multiple instances exist
* Stop: maintenance windows only

## Daily checks

1. Health endpoint status
2. Error rate in logs
3. p95 latency trend
4. Queue pressure
5. Upstream fallback ratio

## Weekly checks

1. API key rotation policy audit
2. Dependency update window
3. Backup validation (configs, env templates, docs)

## Incident procedure

1. Classify severity (P1/P2/P3)
2. Capture symptoms and timeline
3. Mitigate first (failover, temporary limits)
4. Run RCA after stabilization
5. Publish internal incident summary
