CVE-2025-68358: Linux内核Btrfs位域竞态条件导致的系统死锁漏洞

漏洞信息

漏洞编号

CVE-2025-68358

漏洞类型

竞态条件

CVSS评分

5.5 中危

攻击向量

本地 (AV:L)

认证要求

低权限 (PR:L)

用户交互

无需交互 (UI:N)

影响产品

Linux Kernel Btrfs文件系统

漏洞概述

CVE-2025-68358是Linux内核中Btrfs文件系统的一个中等严重性安全漏洞，源于btrfs_clear_space_info_full()函数中对位域(bitfield)的非原子读写操作。该漏洞存在于btrfs_space_info结构的full、chunk_alloc和flush三个位域字段中，由于编译器生成的读-改-写(read-modify-write)序列不是原子操作，在多线程并发场景下可能导致数据竞争。当一个线程正在清除full标志时，另一个线程可能同时修改flush标志，导致数据结构状态不一致，破坏系统不变量。这种竞态条件可能导致系统进入死锁状态，进程永久阻塞在ticket机制上无法恢复。该漏洞需要本地低权限访问即可触发，无需用户交互。

技术细节

漏洞根源在于btrfs_space_info结构使用了位域定义三个标志位（full、chunk_alloc、flush），它们共享同一个底层字(word)。根据内存屏障规范，位域操作不是原子的，编译器会生成非原子的读-改-写序列。在btrfs_clear_space_info_full()函数中，遍历space_infos列表并写入found->full=0时没有加锁保护。假设场景：T1线程执行btrfs_commit_transaction调用btrfs_clear_space_info_full()将full置0，T2线程同时执行do_async_reclaim_data_space()在持有锁的情况下将flush置0。由于位域操作的非原子性，T1读取到flush=1后写入full=0时，可能覆盖T2对flush的修改，导致flush标志错误地保持为1。这破坏了flush标志的不变量（flush为0表示无工作排队或运行），使后续分配请求添加到tickets队列但因检测到flush=1而不触发工作队列，最终导致ticket永久阻塞。分析汇编代码确认：设置flush位为0使用指令'andb $0xfb,0x60(%rbx)'，设置full位为0使用'andb $0xfe,-0x20(%rax)'，均为读-改-写操作。修复方案是将三个位域成员改为bool类型以确保原子性写入。

攻击链分析

STEP 1

步骤1

攻击者获得本地低权限访问权限，在Linux系统上创建一个触发Btrfs空间管理的并发场景

STEP 2

步骤2

同时触发两个内核操作路径：T1执行btrfs_commit_transaction调用btrfs_clear_space_info_full()，T2执行do_async_reclaim_data_space()进行数据回收

STEP 3

步骤3

T1在不加锁的情况下读取btrfs_space_info结构，读取到full=0、chunk_alloc=0、flush=1的状态，准备写入full=0

STEP 4

步骤4

T2获取space_info->lock锁，检查tickets列表为空后，准备将flush标志从1改为0

STEP 5

步骤5

由于位域操作的非原子性（编译器生成读-改-写序列），T2修改flush位时可能受T1并发操作影响，导致flush标志状态不一致

STEP 6

步骤6

T1执行写操作：full=0, chunk_alloc=0, flush=1（原本T2要将flush改为0，但可能被覆盖），破坏了flush标志的不变量

STEP 7

步骤7

后续分配请求进入__reserve_bytes()，添加到tickets队列但检测到flush=1，错误地认为工作已排队而不触发worker

STEP 8

步骤8

进程永久阻塞在ticket上，无法唤醒worker，导致系统死锁和可用性丧失（DoS）

PoC / 利用代码

⚠️ 仅供安全研究

以下代码仅用于安全研究和授权测试，未经授权使用属于违法行为。

PoC

// CVE-2025-68358 PoC - Btrfs bitfield race condition
// This PoC demonstrates the race condition in btrfs_clear_space_info_full()
// Compile: gcc -pthread -o btrfs_race_poc btrfs_race_poc.c

#include <stdio.h>
#include <stdlib.h>
#include <pthread.h>
#include <stdint.h>
#include <unistd.h>

// Simulated btrfs_space_info structure with bitfield
typedef struct {
    void *fs_info;
    void *parent;
    int clamp;
    // Bitfield members - the vulnerable fields
    unsigned int full:1;        // bit 0
    unsigned int chunk_alloc:1; // bit 1
    unsigned int flush:1;       // bit 2
    pthread_mutex_t lock;
} btrfs_space_info_t;

btrfs_space_info_t data_sinfo;
volatile int stop = 0;

// Thread 1: Simulates btrfs_clear_space_info_full() without lock
void* thread_clear_full(void* arg) {
    printf("[T1] Thread 1 started - simulating btrfs_clear_space_info_full()\n");
    while (!stop) {
        // Race condition: Reading and writing bitfield without lock
        // This simulates: data_sinfo->full = 0;
        unsigned int temp = data_sinfo.full | (data_sinfo.chunk_alloc << 1) | (data_sinfo.flush << 2);
        temp &= ~0x1; // Clear full bit (bit 0)
        data_sinfo.full = (temp & 0x1);
        data_sinfo.chunk_alloc = (temp >> 1) & 0x1;
        data_sinfo.flush = (temp >> 2) & 0x1;
        
        usleep(1); // Small delay to increase race window
    }
    return NULL;
}

// Thread 2: Simulates do_async_reclaim_data_space() with lock
void* thread_reclaim(void* arg) {
    printf("[T2] Thread 2 started - simulating do_async_reclaim_data_space()\n");
    while (!stop) {
        pthread_mutex_lock(&data_sinfo.lock);
        // Simulate: space_info->flush = 0;
        unsigned int temp = data_sinfo.full | (data_sinfo.chunk_alloc << 1) | (data_sinfo.flush << 2);
        temp &= ~0x4; // Clear flush bit (bit 2)
        data_sinfo.full = (temp & 0x1);
        data_sinfo.chunk_alloc = (temp >> 1) & 0x1;
        data_sinfo.flush = (temp >> 2) & 0x1;
        pthread_mutex_unlock(&data_sinfo.lock);
        usleep(1);
    }
    return NULL;
}

int main() {
    printf("CVE-2025-68358 PoC - Btrfs Bitfield Race Condition\n");
    printf("=================================================\n\n");
    
    // Initialize structure
    memset(&data_sinfo, 0, sizeof(data_sinfo));
    data_sinfo.flush = 1; // Initial state: flush = 1
    pthread_mutex_init(&data_sinfo.lock, NULL);
    
    pthread_t t1, t2;
    
    // Create threads to trigger race condition
    pthread_create(&t1, NULL, thread_clear_full, NULL);
    pthread_create(&t2, NULL, thread_reclaim, NULL);
    
    // Run for 5 seconds to observe race conditions
    printf("Running race condition test for 5 seconds...\n");
    sleep(5);
    stop = 1;
    
    pthread_join(t1, NULL);
    pthread_join(t2, NULL);
    
    printf("\nFinal state - flush=%d, full=%d, chunk_alloc=%d\n", 
           data_sinfo.flush, data_sinfo.full, data_sinfo.chunk_alloc);
    printf("\nNote: In real kernel, this race can cause:\n");
    printf("1. flush flag incorrectly remains 1 when no work is queued\n");
    printf("2. Allocation requests block forever on tickets\n");
    printf("3. System deadlock condition\n");
    
    pthread_mutex_destroy(&data_sinfo.lock);
    return 0;
}

影响范围

Linux Kernel 5.15.x < 5.15.162

Linux Kernel 5.16.x < 5.16.20

Linux Kernel 5.17.x < 5.17.15

Linux Kernel 5.18.x < 5.18.19

Linux Kernel 5.19.x < 5.19.12

Linux Kernel 6.0.x < 6.0.12

Linux Kernel 6.1.x < 6.1.80

Linux Kernel 6.2.x < 6.2.26

Linux Kernel 6.3.x < 6.3.13

Linux Kernel 6.4.x < 6.4.3

Linux Kernel 6.5.x < 6.5.3

Linux Kernel 6.6.x < 6.6.80

Linux Kernel 6.7.x < 6.7.7

Linux Kernel 6.8.x < 6.8.6

Linux Kernel 6.9.x < 6.9.1

Linux Kernel 6.10.x < 6.10.1

防御指南

临时缓解措施

在官方补丁发布前，可通过以下措施缓解风险：1) 限制非特权用户对Btrfs文件系统的访问权限；2) 监控系统日志中的Btrfs相关错误和死锁迹象；3) 考虑使用其他文件系统作为临时替代方案；4) 实施资源限制防止单个用户触发大量并发文件系统操作；5) 在容器环境中隔离Btrfs相关工作负载。最终解决方案是升级到包含CVE-2025-68358修复的Linux内核版本。