DeepSeek Anthropic-compatible API 第二轮 HTTP 400：为什么 thinking blocks 不能被一刀切剥掉？

多轮 Agent 会话里，消息历史不是简单的 role + text。当模型启用 reasoning / thinking mode 后，上一轮 assistant 的响应里可能包含 thinking 或 redacted_thinking content block。下一轮请求是否要把这些 block 回传，取决于具体 provider 的协议。

NousResearch/hermes-agent issue #22313 记录的就是一个典型兼容层问题：Hermes 在使用 DeepSeek 的 Anthropic-compatible Messages API 时，把第三方 endpoint 的 thinking blocks 全部从 message history 中剥掉，结果第二轮开始 DeepSeek 返回 HTTP 400：

HTTP 400: The `content[].thinking` in the thinking mode must be passed back to the API.

这个问题的关键不在“能不能调用 DeepSeek”，而在 Anthropic-compatible API 的消息历史语义并不完全相同。兼容的是入口形状，不代表所有 provider 对 thinking block、signature、reasoning continuity 的要求都一样。

触发条件：DeepSeek `/anthropic` + reasoning_effort + 多轮会话

issue 中的配置大致是：

model:
  default: deepseek-v4-pro
  provider: deepseek
  base_url: https://api.deepseek.com/anthropic
agent:
  reasoning_effort: medium

第一轮请求可以正常返回。问题通常出现在第二轮或后续轮次，尤其是带工具调用的 Agent 会话：

第 1 轮：assistant 返回文本 + thinking blocks
        ↓
Hermes 保存历史
        ↓
第 2 轮：Hermes 转换 messages
        ↓
convert_messages_to_anthropic() 剥掉 thinking blocks
        ↓
DeepSeek 认为 reasoning mode 下缺少应回传的 content[].thinking
        ↓
HTTP 400

所以它不是单轮 chat completion 的问题，而是 multi-turn message replay 的问题。

根因：把所有非 anthropic.com endpoint 都当成同一类第三方

issue 指向的核心函数是：

agent/anthropic_adapter.py
convert_messages_to_anthropic()

旧逻辑大致是：

如果 endpoint 不是 anthropic.com，就认为是 third-party；
对 third-party endpoint，剥掉所有 thinking / redacted_thinking blocks。

这样做有它的背景：Anthropic 的 thinking block 可能带有 Anthropic 自己的签名，第三方 provider 无法验证这些签名。把签名块原样发给第三方，可能会造成协议错误或安全问题。

但问题在于：DeepSeek 自己生成的 unsigned thinking blocks，在 reasoning mode 的多轮历史中需要被保留。全部剥掉就破坏了 DeepSeek 的上下文连续性。

换句话说，旧逻辑把两件事混在一起了：

不能把 Anthropic 签名 thinking block 发给第三方；
DeepSeek 自己生成的 unsigned thinking block 需要回传给 DeepSeek。

这两个规则并不矛盾，但不能用“全部删除”来实现。

为什么 Kimi 已经有特殊分支，DeepSeek 也需要？

issue 提到 Kimi /coding endpoint 已有类似处理分支：

_is_kimi branch

原因是 Kimi 的某些兼容端点也要求多轮历史中保留 thinking blocks。

DeepSeek 的 /anthropic endpoint 属于同一类问题：

它使用 Anthropic-compatible message shape；
但 provider 对 thinking block replay 有自己的要求；
不能简单套用所有 third-party endpoint 的默认剥离策略。

这类兼容层代码通常需要 provider-aware branch，而不是只判断：

host == anthropic.com ? keep : strip

更稳的判断应该是：

Anthropic official endpoint：按 Anthropic 规则处理 signed thinking；
DeepSeek Anthropic-compatible endpoint：保留 DeepSeek-native unsigned thinking；
Kimi coding endpoint：保留 Kimi 需要的 thinking；
其它 third-party endpoint：根据是否支持 thinking 决定剥离或保留。

修复方向：识别 DeepSeek endpoint，保留 unsigned thinking

issue 中给出的修复思路是增加一个 endpoint detection：

def _is_deepseek_anthropic_endpoint(base_url: str | None) -> bool:
    normalized = _normalize_base_url_text(base_url)
    if not normalized:
        return False
    normalized = normalized.rstrip("/").lower()
    return "api.deepseek.com" in normalized and "anthropic" in normalized

然后在 convert_messages_to_anthropic() 中增加 DeepSeek 分支：

如果 content block 不是 thinking 类型：保留；
如果是 signed Anthropic thinking：剥离；
如果是 DeepSeek-native unsigned thinking：保留。

issue 后续评论提到，main 分支中已经有 _is_deepseek_anthropic_endpoint()，并通过 _preserve_unsigned_thinking 保留 DeepSeek + Kimi 所需的 thinking blocks。

这说明最终修复思路不是“永远保留 thinking”，而是更细：

保留 provider 自己生成、下一轮必须回传的 unsigned thinking；
不要把外部 provider 无法验证的 signed thinking 混发过去。

Workaround：临时关闭 reasoning_effort

issue 中给出的绕过方式是：

agent:
  reasoning_effort: ''

也就是在使用 DeepSeek Anthropic-compatible endpoint 时关闭 thinking mode。

这样做能绕开 content[].thinking 必须回传的问题，因为没有 reasoning mode，就不会产生需要 replay 的 thinking blocks。

但这只是降级方案，代价也明显：

reasoning 能力下降；
复杂工具任务的稳定性可能下降；
无法验证多轮 thinking continuity；
不能覆盖需要 reasoning_effort 的真实生产场景。

长期来看，更好的做法还是升级到包含 DeepSeek endpoint detection 和 unsigned thinking preservation 的 Hermes 版本。

排查清单：HTTP 400 thinking mode must be passed back

如果你遇到类似错误：

HTTP 400: The `content[].thinking` in the thinking mode must be passed back to the API.

可以按这个顺序排查：

1. 是否启用了 reasoning_effort

检查配置里是否有：

agent:
  reasoning_effort: medium

或其它非空 reasoning 设置。

2. 是否使用 DeepSeek Anthropic-compatible endpoint

例如：

https://api.deepseek.com/anthropic

如果是 OpenAI-compatible endpoint，错误形态可能不同；这里讨论的是 Anthropic-compatible Messages API。

3. 错误是否出现在第二轮以后

如果第一轮正常、第二轮开始 400，通常说明 message history replay 或 adapter conversion 出问题。

4. 检查 Hermes 版本是否包含 DeepSeek endpoint detection

关注类似函数或逻辑：

_is_deepseek_anthropic_endpoint()
_preserve_unsigned_thinking

如果当前版本没有这些逻辑，升级 Hermes 可能比改配置更直接。

5. 临时关闭 reasoning_effort 验证

如果关闭 reasoning 后不再报错，基本可以确认问题集中在 thinking block replay。

为什么不能简单“保留所有 thinking blocks”？

这听起来是个自然想法：既然 DeepSeek 要 thinking，那全部保留不就好了？

问题是不同 provider 的 thinking block 可能包含不同验证机制。

例如：

Anthropic signed thinking block：可能带 signature / data；
DeepSeek native thinking block：可能是 unsigned；
Kimi thinking block：又有自己的兼容要求。

如果把 Anthropic 签名块转发给 DeepSeek，DeepSeek 并不能验证 Anthropic 的签名，甚至可能拒绝请求。

所以兼容层要做的是分类处理：

不是 thinking：保留；
当前 provider 自己生成且要求 replay 的 unsigned thinking：保留；
其它 provider 的 signed thinking：剥离；
不支持 thinking replay 的 endpoint：剥离或降级。

这比简单 keep-all 或 strip-all 都更安全。

对 OpenAI-compatible / Anthropic-compatible 中转的启发

很多工具会同时接入 DeepSeek、Kimi、Claude、OpenAI-compatible gateway、Anthropic-compatible endpoint，甚至通过 DeepAI API 中转站统一管理 Base URL、API Key 和模型路由。

这种集中管理很适合解决多模型接入、密钥切换、路由和成本控制问题。但在 Agent 侧，还要特别关注 provider metadata 和消息格式差异：

reasoning_effort 是否支持；
thinking blocks 是否需要 replay；
signed / unsigned thinking 如何处理；
tool call schema 是否一致；
streaming delta 字段是否一致；
usage 字段是否完整；
context length 是否准确。

DeepSeek 这个 issue 的教训是：只要涉及 reasoning 多轮历史，兼容层就不能只看 endpoint 形状，还要看 provider 对历史消息内容的验证规则。

FAQ

为什么 DeepSeek 第一轮能用，第二轮才报 HTTP 400？

因为第一轮还没有需要回放的 assistant thinking history。第二轮开始，Hermes 会把上一轮 assistant message 转成历史消息；如果 thinking blocks 被剥掉，DeepSeek 就会拒绝。

`content[].thinking must be passed back` 是什么意思？

在 reasoning mode 下，DeepSeek 要求上一轮生成的 thinking block 在后续请求中回传，以保持推理上下文连续。

为什么 Hermes 原来要剥离 thinking blocks？

为了避免把 Anthropic 官方 signed thinking blocks 发给第三方 endpoint。第三方通常无法验证 Anthropic 签名。

正确修复是保留所有 thinking 吗？

不是。正确做法是 provider-aware：DeepSeek/Kimi 这类需要 replay 的 endpoint 保留 unsigned thinking，同时剥离不适合该 provider 的 signed thinking。

临时解决办法是什么？

可以关闭 reasoning mode，例如设置 reasoning_effort: ''。但长期建议升级到包含 DeepSeek endpoint detection 和 unsigned thinking preservation 的版本。

总结

#22313 的核心可以概括为：

Hermes 把 DeepSeek Anthropic-compatible endpoint 当成普通 third-party endpoint，剥掉了多轮 reasoning 必须回传的 thinking blocks，导致第二轮请求被 DeepSeek HTTP 400 拒绝。

修复方向是：

识别 api.deepseek.com/anthropic；
保留 DeepSeek-native unsigned thinking；
剥离第三方无法验证的 signed thinking；
让 reasoning_effort 下的 message history replay 符合 provider 要求。

对 Agent 工具来说，Anthropic-compatible 不只是 URL 和字段名兼容。真正稳定的多轮 reasoning，还需要兼容层理解每个 provider 对 thinking、signature、tool call 和历史消息的细节要求。