Building Production Systems with Claude Code: Best Practices and Real-World Automation

, ,
Updated Feb 6, 2026

The Real Test: Can Claude Code Run Unsupervised?

Most AI coding tools shine during interactive sessions — you ask, they answer, you iterate. But production systems need to run at 3 AM when you’re asleep. They need to handle errors gracefully, skip work when quotas are exhausted, and log everything for post-mortem analysis.

This is where the tildalice.io auto-posting system gets interesting. It’s been publishing blog posts via cron for weeks now, using Claude Code CLI to generate content hourly between midnight and 7 AM KST. No human supervision. The system calls Claude Opus, falls back to Sonnet if quota runs out, and exits cleanly if both models are exhausted. It logs every decision, notifies Slack on success or failure, and maintains a JSON history to avoid topic repetition.

Let me show you how it works — and more importantly, where the naive approach breaks down.

Close-up view of modern automation machinery in an industrial setting.
Photo by KJ Brix on Pexels

Approach 1: Direct API Calls (The Obvious Choice That Fails)

Your first instinct might be: “Just use the Anthropic API directly. Send a prompt, get JSON back, parse it, publish to WordPress.” Clean, simple, no CLI wrapper.

Here’s what that looks like:

import anthropic
import json

client = anthropic.Anthropic(api_key="sk-ant-...")

def generate_post_api(topic, category):
    prompt = f"""Write a technical blog post about {topic}.
    Category: {category}
    Return JSON with title, content (Markdown), tags, excerpt, slug."""

    message = client.messages.create(
        model="claude-opus-4-5-20251101",
        max_tokens=4096,
        messages=[{"role": "user", "content": prompt}]
    )

    return json.loads(message.content[0].text)

This works. For about three days.

Then you hit the API rate limits. The anthropic Python SDK raises anthropic.RateLimitError, which you can catch. But here’s the problem: you’re paying per token. If your blog posts average 2500 words and you’re publishing 8 times a day, you’re burning through credits fast. At 15permillioninputtokensand15 per million input tokens and75 per million output tokens (Opus 4.5 pricing as of early 2025), each 3000-word post costs roughly 0.500.50-0.75. Multiply by 240 posts/month and you’re looking at 120120-180/month just for content generation.

And that’s assuming you never regenerate, never test prompts, never iterate.

Approach 2: Claude Code CLI with OAuth (The Counterintuitive Winner)

Claude Code CLI uses your Claude.ai subscription (Pro or Max plan), not API credits. If you’re already paying 2020-100/month for the web interface, you get Claude Code access for free. The catch? Usage limits are enforced per-plan, not per-token. Opus might exhaust after 50-100 heavy requests in a 24-hour window, but Sonnet is far more generous.

Here’s the production implementation:

import subprocess
import json
import sys

class UsageExhaustedError(Exception):
    """Raised when both Opus and Sonnet quotas are exhausted."""
    pass

def claude_request(prompt, model="opus"):
    """Call Claude Code CLI with prompt. Returns raw text response."""
    cmd = [
        "claude", "code", "exec",
        "--model", model,
        "--no-spinner",
        "--", prompt
    ]

    result = subprocess.run(
        cmd,
        capture_output=True,
        text=True,
        timeout=300  # 5min max, some posts take time
    )

    # Check for quota exhaustion in stderr
    if "usage limit" in result.stderr.lower() or \
       "quota exceeded" in result.stderr.lower():
        if model == "opus":
            # Fallback to Sonnet
            print("[WARN] Opus exhausted, falling back to Sonnet", file=sys.stderr)
            return claude_request(prompt, model="sonnet")
        else:
            # Both models exhausted
            raise UsageExhaustedError("Both Opus and Sonnet quotas exhausted")

    if result.returncode != 0:
        raise RuntimeError(f"Claude CLI failed: {result.stderr}")

    return result.stdout.strip()

def generate_post(topic, category):
    """Generate blog post via Claude Code CLI."""
    # Actual prompt is ~200 lines with style rules, rhythm constraints,
    # banned phrases, etc. Truncated here for brevity.
    prompt = f"""You are TildAlice, writing about {topic}.
    Category: {category}
    [...full instructions from CLAUDE.md...]
    Return JSON only: {{"title": "...", "content": "...", ...}}
    """

    try:
        response = claude_request(prompt)
        # Claude sometimes wraps JSON in markdown fences
        if response.startswith("```json"):
            response = response.split("```json\n", 1)[1].rsplit("```", 1)[0]
        return json.loads(response)
    except UsageExhaustedError:
        print("[INFO] Claude quota exhausted, skipping post")
        sys.exit(0)  # Exit cleanly so cron doesn't alert
    except json.JSONDecodeError as e:
        # Log the raw response for debugging
        print(f"[ERROR] Failed to parse JSON: {e}", file=sys.stderr)
        print(f"Raw response: {response[:500]}...", file=sys.stderr)
        raise

Notice the error handling. When Opus exhausts, we immediately retry with Sonnet — same prompt, same context, just a different model. If Sonnet also fails, we exit with code 0 (not an error) so cron doesn’t send panicked emails at 4 AM.

This approach costs $20-$100/month (your existing subscription) instead of $120-$180/month in API fees. The tradeoff? You can’t programmatically check quota limits in advance. You only know you’ve hit the wall when the CLI fails.

Production Hardening: The Details That Matter

Here’s what breaks in production that never breaks during development:

1. Topic Uniqueness Across Time

You can’t just pick random topics forever. After 100 posts, you’ll start repeating. The system tracks the last 30 topics and refuses to generate duplicates:

def pick_topic(posted_log):
    """Generate a unique topic not in recent 30 posts."""
    recent_topics = [entry["topic"].lower() 
                     for entry in posted_log[-30:]]

    for attempt in range(10):
        category = random.choice(list(TOPICS.keys()))
        keywords = TOPICS[category]["keywords"]

        prompt = f"""Generate ONE unique technical topic about {category}.
        Use these keywords for inspiration: {keywords}
        Recent topics (DO NOT REPEAT): {recent_topics}
        Return just the topic string, nothing else."""

        topic = claude_request(prompt, model="sonnet").strip()

        if topic.lower() not in recent_topics:
            return topic, category

    # After 10 attempts, give up (shouldn't happen with 8 diverse categories)
    raise RuntimeError("Failed to generate unique topic after 10 attempts")

Why Sonnet for topic generation? Because it’s a lightweight task. Save Opus quota for the actual blog post.

2. KaTeX Math Rendering on the Server

Claude loves writing LaTeX equations (as instructed). But WordPress doesn’t render them natively. The naive solution is a client-side KaTeX plugin, but that adds 150KB+ to every page load and breaks if the CDN is down.

Server-side rendering is faster and more reliable:

import re
import subprocess

def render_math_server_side(html):
    """Replace LaTeX <!--MATH_BLOCK_6--> and <!--MATH_BLOCK_0--> with KaTeX-rendered HTML."""
    # Display equations: <!--MATH_BLOCK_1-->
    def replace_display(match):
        latex = match.group(1).strip()
        # Call katex CLI (installed via npm globally)
        result = subprocess.run(
            ["katex", "--display-mode"],
            input=latex,
            capture_output=True,
            text=True
        )
        if result.returncode != 0:
            print(f"[WARN] KaTeX failed on: {latex[:50]}...", file=sys.stderr)
            return match.group(0)  # Keep original if render fails
        return f'<div class="math-display">{result.stdout}</div>'

    html = re.sub(r'\$\$(.+?)\$\$', replace_display, html, flags=re.DOTALL)

    # Inline equations: <!--MATH_BLOCK_7-->
    def replace_inline(match):
        latex = match.group(1).strip()
        result = subprocess.run(
            ["katex"],
            input=latex,
            capture_output=True,
            text=True
        )
        if result.returncode != 0:
            return match.group(0)
        return f'<span class="math-inline">{result.stdout}</span>'

    html = re.sub(r'\$(.+?)\$', replace_inline, html)

    return html

This runs after Markdown-to-HTML conversion but before WordPress publishing. The rendered HTML is cached in the post content, so math renders even if JavaScript is disabled.

3. Graceful Degradation in Low-Memory Environments

The Oracle Cloud Free Tier server has 1GB RAM. Running Claude Code CLI + Python + WordPress + MySQL simultaneously can trigger OOM kills. The solution? Offload heavy computation to subprocesses and clean up aggressively:

import gc

def publish_series(topic, count):
    """Publish multi-episode series with memory management."""
    series_plan = generate_series_plan(topic, count)
    titles = series_plan["titles"]
    published = []

    for i, episode_title in enumerate(titles, 1):
        try:
            post_data = generate_series_post(
                topic, episode_title, i, count, titles
            )
        except UsageExhaustedError:
            print(f"[INFO] Quota exhausted after {i-1}/{count} episodes")
            break  # Keep what we published, skip the rest

        # Publish immediately (don't accumulate in memory)
        post_id = publish_to_wordpress(
            post_data["title"],
            post_data["content"],
            # ... other fields
        )
        published.append(post_id)

        # Force garbage collection after each episode
        gc.collect()

    return published

If the quota runs out halfway through a 5-part series, the system publishes the first 2-3 episodes and exits. The remaining episodes aren’t lost — the series plan is deterministic, so re-running with the same topic will pick up where it left off (though in practice, you’d manually trigger the remaining episodes).

The Monitoring Layer: Slack Notifications and Structured Logs

Production systems need observability. Every post triggers a Slack notification:

import requests

def notify_slack(title, url, category, status="success", error=None):
    """Send Slack notification to #blog-status channel."""
    color = "#36a64f" if status == "success" else "#ff0000"

    payload = {
        "channel": "#blog-status",
        "attachments": [{
            "color": color,
            "title": title,
            "title_link": url if status == "success" else None,
            "fields": [
                {"title": "Category", "value": category, "short": True},
                {"title": "Status", "value": status.upper(), "short": True}
            ],
            "footer": "TildAlice Auto-Poster",
            "ts": int(time.time())
        }]
    }

    if error:
        payload["attachments"][0]["fields"].append({
            "title": "Error",
            "value": f"```{error[:200]}```",
            "short": False
        })

    requests.post(
        "https://hooks.slack.com/services/YOUR/WEBHOOK/URL",
        json=payload
    )

And every execution appends to posted_log.json:

[
  {
    "timestamp": "2026-02-04T03:00:00Z",
    "topic": "Optimizing Transformer Inference with KV Cache Quantization",
    "category": "AI/Deep Learning",
    "post_id": 187,
    "url": "https://tildalice.io/kv-cache-quantization/",
    "model_used": "opus"
  },
  {
    "timestamp": "2026-02-04T04:00:00Z",
    "topic": "Polars vs Pandas: When Query Optimization Beats Vectorization",
    "category": "Data Analysis",
    "post_id": 188,
    "url": "https://tildalice.io/polars-vs-pandas-query-opt/",
    "model_used": "sonnet"
  }
]

This log feeds into weekly analytics (total posts, category distribution, model usage) and serves as the source of truth for topic uniqueness checks.

What I’d Do Differently Next Time

The current system works, but it’s not perfect. A few sharp edges I haven’t solved:

  1. No retry logic for transient WordPress API failures. If the REST API times out mid-publish, the post is lost. Should implement exponential backoff with max 3 retries.

  2. Math rendering fails silently. If KaTeX can’t parse a formula (unbalanced braces, unsupported commands), it falls back to raw LaTeX without alerting me. I only notice when skimming posts manually.

  3. Series navigation is brittle. If I manually delete an episode from WordPress admin, the “Previous / Next” links break. Should add a health check that scans published posts and rebuilds nav links on mismatch.

  4. Claude Code CLI doesn’t expose token counts. The API returns usage metadata (input/output tokens), but the CLI doesn’t. I can’t track cost-per-post or predict when I’ll hit quota limits. My best guess is Opus exhausts around 50-80 heavy requests per day, but that’s purely empirical.

But here’s the thing: none of these issues have caused downtime. The system degrades gracefully. Math rendering falls back to LaTeX. Series nav breaks only for deleted posts (rare). WordPress timeouts are infrequent enough that I haven’t prioritized retries.

In production, “good enough to run unsupervised” beats “theoretically perfect.”

The Real Cost Comparison: API vs CLI Over 6 Months

Let’s run the numbers. Assume 8 posts/day, 30 days/month, 2500 words/post.

API Approach (Opus 4.5):
– Input tokens: ~500 (prompt) per post
– Output tokens: ~3500 (2500 words ≈ 3500 tokens, roughly)
– Cost per post: (500×15+3500×75)/1,000,000=ESCAPEDDOLLARSIGN0.27(500 \times 15 + 3500 \times 75) / 1,000,000 = ESCAPED_DOLLAR_SIGN0.27
– Monthly cost: 0.27×240=ESCAPEDDOLLARSIGN64.800.27 \times 240 = ESCAPED_DOLLAR_SIGN64.80
– 6-month cost: 64.80×6=ESCAPEDDOLLARSIGN388.8064.80 \times 6 = ESCAPED_DOLLAR_SIGN388.80

CLI Approach (Max Plan):
– Monthly subscription: $100 (includes Claude.ai Pro features)
– 6-month cost: 100×6=ESCAPEDDOLLARSIGN600100 \times 6 = ESCAPED_DOLLAR_SIGN600

Wait — the API is cheaper?

Not quite. The $100/month Max plan also gives you:
– Unlimited web interface usage (research, code review, prototyping)
– Access to Claude Code for interactive development
– Priority access during peak hours
– Extended context windows (200K tokens vs API’s default)

If you’re only running automated batch jobs and never touching the web interface, the API wins. But if you’re a developer who uses Claude.ai daily for other work, the CLI approach bundles automation into your existing subscription at zero marginal cost.

For tildalice.io, I’m already paying for Max to write code interactively. The auto-poster is a free rider.

Use Claude Code CLI for Automation If You Already Subscribe

Here’s the decision tree:

  • Pay for API credits if: You’re building a commercial service, need guaranteed uptime SLAs, require fine-grained token tracking, or don’t use Claude.ai for anything else.

  • Use CLI automation if: You already subscribe to Pro/Max, can tolerate occasional quota exhaustion, and want to avoid per-token billing.

The CLI’s biggest strength is cost predictability. Your monthly bill never changes, no matter how many posts you generate. The API’s strength is programmatic control — you can inspect headers, track exact token usage, and integrate with enterprise billing systems.

For personal projects, side hustles, and internal tools, the CLI is hard to beat. For customer-facing SaaS products, the API’s reliability guarantees are worth paying for.

What I’m curious about next: can you chain multiple Claude Code agents in a pipeline? Right now, I call Claude once per post. But what if topic generation, outline drafting, content writing, and SEO optimization were separate agents with specialized prompts? Would that improve quality, or just burn through quota faster?

I haven’t tested it. Maybe that’s Part 4.

Claude Code Series (3/3)

Did you find this helpful?

☕ Buy me a coffee

Comments

Leave a Reply

Your email address will not be published. Required fields are marked *

TODAY 396 | TOTAL 2,619