ideogram-incident-runbook

面向 Ideogram 集成故障的端到端应急响应能力,覆盖实时故障识别、分级响应决策、按错误类型自动执行修复操作、多渠道状态通报,以及结构化归档取证与复盘分析,支撑快速恢复服务并持续优化稳定性。

快捷安装

在终端运行此命令,即可一键安装该 Skill 到您的 Claude 中

npx skills add jeremylongshore/claude-code-plugins-plus-skills --skill "ideogram-incident-runbook"

Ideogram Incident Runbook

Overview

Rapid incident response for Ideogram API outages, auth failures, rate limiting, and degraded generation quality. Covers triage, immediate remediation, fallback activation, and postmortem process.

Severity Levels

LevelDefinitionResponse TimeExample
P1API unreachable or all requests failing< 15 min401 on valid key, 500 on all requests
P2Degraded quality or performance< 1 hourP95 latency > 30s, high 429 rate
P3Minor impact, workaround exists< 4 hoursOccasional safety rejections, slow downloads
P4No user impactNext business dayMonitoring gaps, stale cache

Quick Triage (Run These First)

set -euo pipefail

echo "=== IDEOGRAM TRIAGE ==="

# 1. Test API connectivity and auth
echo -n "API status: "
curl -s -o /dev/null -w "%{http_code}" \
  -X POST https://api.ideogram.ai/generate \
  -H "Api-Key: $IDEOGRAM_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{"image_request":{"prompt":"triage test","model":"V_2_TURBO","magic_prompt_option":"OFF"}}'
echo ""

# 2. Test V3 endpoint
echo -n "V3 status: "
curl -s -o /dev/null -w "%{http_code}" \
  -X POST https://api.ideogram.ai/v1/ideogram-v3/generate \
  -H "Api-Key: $IDEOGRAM_API_KEY" \
  -F "prompt=triage test" -F "rendering_speed=FLASH"
echo ""

# 3. Check DNS resolution
echo -n "DNS: "
nslookup api.ideogram.ai 2>/dev/null | grep -A1 "Name:" | tail -1 || echo "lookup failed"

# 4. Measure latency
echo -n "Latency: "
curl -s -o /dev/null -w "%{time_total}s" \
  -X POST https://api.ideogram.ai/generate \
  -H "Api-Key: $IDEOGRAM_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{"image_request":{"prompt":"latency test","model":"V_2_TURBO","magic_prompt_option":"OFF"}}'
echo ""

Decision Tree

Is api.ideogram.ai returning errors?
├─ YES: What status code?
│   ├─ 401 → Key revoked or misconfigured. See "Auth Failure" below.
│   ├─ 402 → Credits exhausted. Top up immediately.
│   ├─ 422 → Safety filter. Prompt issue, not outage.
│   ├─ 429 → Rate limited. Reduce concurrency.
│   ├─ 500/503 → Ideogram outage. Enable fallback.
│   └─ Timeout → Network or Ideogram performance issue.
├─ NO: Are images generating but quality is bad?
│   ├─ YES → Check model version, style params, magic_prompt setting.
│   └─ NO → Check image download (URLs may have expired).
└─ Not sure: Run triage script above.

Immediate Actions

401 — Authentication Failure

set -euo pipefail
# Verify key is set
echo "Key present: ${IDEOGRAM_API_KEY:+YES}${IDEOGRAM_API_KEY:-NO}"
echo "Key length: ${#IDEOGRAM_API_KEY}"

# If key was rotated, update everywhere:
# 1. Ideogram dashboard: create new key
# 2. Update secret manager / env vars
# 3. Restart affected services

# Kubernetes
kubectl create secret generic ideogram-secrets \
  --from-literal=api-key="$NEW_KEY" \
  --dry-run=client -o yaml | kubectl apply -f -
kubectl rollout restart deployment/ideogram-service

429 — Sustained Rate Limiting

set -euo pipefail
# Reduce concurrency immediately
kubectl set env deployment/ideogram-service IDEOGRAM_CONCURRENCY=3

# If sustained, contact Ideogram for limit increase
# [email protected]

500/503 — Ideogram Outage

set -euo pipefail
# Enable fallback mode (return placeholder images)
kubectl set env deployment/ideogram-service IDEOGRAM_FALLBACK=true
kubectl rollout restart deployment/ideogram-service

# Monitor for resolution, then disable fallback
# kubectl set env deployment/ideogram-service IDEOGRAM_FALLBACK=false

402 — Credits Exhausted

1. Log into ideogram.ai > Settings > API Beta
2. Check current balance
3. Increase auto top-up amount
4. Or manually add credits
5. Verify generation works again

Fallback Implementation

const FALLBACK_ENABLED = process.env.IDEOGRAM_FALLBACK === "true";

async function generateWithFallback(prompt: string, options: any = {}) {
  if (FALLBACK_ENABLED) {
    return {
      data: [{
        url: `https://placehold.co/1024x1024/333/fff?text=${encodeURIComponent("Image unavailable")}`,
        seed: 0,
        resolution: "1024x1024",
        is_image_safe: true,
        fallback: true,
      }],
    };
  }

  try {
    return await generateImage(prompt, options);
  } catch (err: any) {
    if (err.status >= 500) {
      console.error("Ideogram 5xx -- serving fallback");
      return generateWithFallback(prompt, options);
    }
    throw err;
  }
}

Communication Templates

Internal (Slack)

P[X] INCIDENT: Ideogram Integration
Status: INVESTIGATING / MITIGATED / RESOLVED
Impact: [e.g., Image generation unavailable for users]
Cause: [e.g., API returning 500, or key revoked]
Action: [e.g., Fallback enabled, monitoring for resolution]
Next update: [time]
Owner: @[name]

Postmortem Template

## Incident: Ideogram [Type]
**Date:** YYYY-MM-DD | **Duration:** Xh Ym | **Severity:** P[1-4]

### Summary
[1-2 sentences]

### Timeline
- HH:MM - First alert triggered
- HH:MM - Triage started
- HH:MM - Fallback enabled
- HH:MM - Root cause identified
- HH:MM - Resolved

### Root Cause
[Technical explanation]

### Action Items
- [ ] [Fix] - Owner - Due date
- [ ] [Prevention] - Owner - Due date

Error Handling

IssueDetectionMitigation
Total API outageHealth check failsEnable fallback images
Key revoked401 on valid configRotate key immediately
Credits depleted402 responsesTop up, pause batch jobs
Rate limit floodSustained 429Reduce concurrency to 3

Output

  • Incident identified and categorized by severity
  • Immediate remediation applied
  • Fallback activated if needed
  • Stakeholders notified with template
  • Evidence collected for postmortem

Resources

Next Steps

For data handling patterns, see ideogram-data-handling.