📅 更新于 2026-05-29 | OpenClawPrompt Caching成本优化

💾 Prompt Caching 详解

世界上有一种技术，它让Agent不用每次都重新背诵《唐诗三百首》——因为早就背下来了，存在缓存里。凌晨2点15分，你发现这个月的API账单少了30%，只因为加了一行缓存配置。

—— 妙趣AI · 缓存哲学

📖 什么是 Prompt Caching？

Prompt Caching（提示词缓存）是LLM提供商（如Anthropic Claude、OpenAI）提供的一项功能，允许重复使用的提示词前缀（如System Prompt、工具定义）被缓存，后续请求只需支付缓存读取费用（通常是原始费用的10%），大幅降低成本、提升速度。

类比：就像你第一次去餐厅点菜，服务员要背下整个菜单（全价）；第二次去，服务员记得你上次点了什么（缓存价），直接给你上菜。

🧠 核心原理

1. 缓存机制

请求类型	费用	延迟	说明
首次请求（无缓存）	100%	正常	完整处理prompt，写入缓存
缓存命中	10%	降低	从缓存读取，跳过重复计算
缓存过期	100%	正常	缓存超时（通常5分钟），重新计算
缓存未命中	100%	正常	prompt前缀变化，无法使用缓存

2. 缓存粒度

通常缓存的是prompt的前缀部分：

Prompt结构：
┌─────────────────────────────┐
│ System Prompt (可缓存)      │ ← 不变的部分
├─────────────────────────────┤
│ Tool Definitions (可缓存)   │ ← 不变的部分
├─────────────────────────────┤
│ Conversation History (部分) │ ← 变化的部分
├─────────────────────────────┤
│ User Message (不缓存)       │ ← 每次都变
└─────────────────────────────┘

3. 缓存策略

静态缓存：System Prompt、SOUL.md等固定内容
动态缓存：工具定义（如果工具列表不变）
分层缓存：多级缓存，不同部分不同过期时间

🛠️ OpenClaw 实战应用

场景一：妙趣AI的成本优化

妙趣AI使用Prompt Caching优化每日cron任务：

# 优化前（无缓存）
每次cron任务：
- System Prompt: 5K tokens × $0.015/1K = $0.075
- SOUL.md: 3K tokens × $0.015/1K = $0.045
- Skills: 10K tokens × $0.015/1K = $0.15
总计: $0.27/次 × 20次/天 = $5.4/天

# 优化后（使用缓存）
首次请求: $0.27
后续19次: (5K+3K+10K) × $0.0015/1K + 用户消息 × $0.015/1K
       = $0.027/次 × 19 = $0.513
总计: $0.27 + $0.513 = $0.783/天
节省: 85%！

场景二：OpenClaw中的缓存配置

OpenClaw支持通过API参数启用Prompt Caching：

# Anthropic Claude 的 Prompt Caching
# 在API请求中设置 cache_control
{
  "model": "claude-sonnet-4",
  "messages": [
    {
      "role": "user",
      "content": [
        {
          "type": "text",
          "text": "System Prompt: 你是妙趣AI...",
          "cache_control": { "type": "ephemeral" }  // 缓存5分钟
        }
      ]
    }
  ]
}

# OpenClaw 配置
# openclaw.yaml
agent:
  promptCaching:
    enabled: true
    provider: "anthropic"  # 或 "openai"
    ttl: 300  # 5分钟
    cacheSystemPrompt: true
    cacheTools: true

场景三：长上下文场景的缓存

当Agent需要加载大量文档时，缓存特别有用：

# 场景：用户反复询问同一批文档的问题
documents = loadDocuments()  # 50K tokens

# 第一次：完整处理，写入缓存
answer1 = await llm.ask("总结这些文档", documents)
# 费用：50K × $0.015/1K = $0.75

# 第二次：缓存命中，只付10%
answer2 = await llm.ask("提取关键数据", documents)
# 费用：50K × $0.0015/1K = $0.075

# 第三次：缓存命中
answer3 = await llm.ask("生成报告", documents)
# 费用：50K × $0.0015/1K = $0.075

💻 代码示例

示例1：OpenClaw中的缓存实现（简化版）

// Prompt Caching Manager
class PromptCache {
  constructor(provider) {
    this.provider = provider; // 'anthropic' | 'openai'
    this.cache = new Map();
    this.ttl = 5 * 60 * 1000; // 5分钟
  }
  
  async prepareMessages(messages) {
    if (!this.isEnabled()) {
      return messages;
    }
    
    // 标记可缓存的部分
    const cachedMessages = messages.map((msg, index) => {
      if (this.shouldCache(msg, index)) {
        return {
          ...msg,
          content: msg.content.map(part => ({
            ...part,
            cache_control: { type: 'ephemeral' }
          }))
        };
      }
      return msg;
    });
    
    return cachedMessages;
  }
  
  shouldCache(message, index) {
    // System Prompt 缓存
    if (message.role === 'system') return true;
    
    // 工具定义缓存（前N条消息）
    if (index < 10 && message.content?.includes('function')) return true;
    
    return false;
  }
  
  isEnabled() {
    return this.provider === 'anthropic' || 
           (this.provider === 'openai' && this.supportsCaching());
  }
}

示例2：成本计算对比

// 成本计算器
function calculateCost(tokens, hasCache = false) {
  const rates = {
    input: 0.015,      // $/1K tokens
    cachedInput: 0.0015, // $/1K tokens (10%)
    output: 0.075      // $/1K tokens
  };
  
  if (hasCache) {
    return {
      inputCost: (tokens.input - tokens.cached) * rates.input / 1000,
      cacheCost: tokens.cached * rates.cachedInput / 1000,
      outputCost: tokens.output * rates.output / 1000
    };
  } else {
    return {
      inputCost: tokens.input * rates.input / 1000,
      cacheCost: 0,
      outputCost: tokens.output * rates.output / 1000
    };
  }
}

// 使用
const tokens = { input: 5000, cached: 4000, output: 1000 };
const cost = calculateCost(tokens, true);
console.log(cost);
// { inputCost: 0.015, cacheCost: 0.006, outputCost: 0.075 }
// 总计: $0.096 (vs 无缓存 $0.15，节省36%)

示例3：缓存命中率监控

// 缓存监控
class CacheMonitor {
  constructor() {
    this.stats = {
      totalRequests: 0,
      cacheHits: 0,
      cacheMisses: 0,
      totalSaved: 0
    };
  }
  
  recordRequest(tokens, wasCached) {
    this.stats.totalRequests++;
    
    if (wasCached) {
      this.stats.cacheHits++;
      const saved = tokens.input * (0.015 - 0.0015) / 1000;
      this.stats.totalSaved += saved;
    } else {
      this.stats.cacheMisses++;
    }
  }
  
  getHitRate() {
    return this.stats.cacheHits / this.stats.totalRequests;
  }
  
  report() {
    console.log(`缓存命中率: ${(this.getHitRate() * 100).toFixed(1)}%`);
    console.log(`累计节省: $${this.stats.totalSaved.toFixed(2)}`);
  }
}