Security Vulnerability Report
中文
CVE-2025-68358 CVSS 5.5 MEDIUM

CVE-2025-68358

Published: 2025-12-24 11:15:59
Last Modified: 2026-02-26 18:49:43
Source: 416baaa9-dc9f-4396-8d5f-8c081fb06d67

Description

In the Linux kernel, the following vulnerability has been resolved: btrfs: fix racy bitfield write in btrfs_clear_space_info_full() From the memory-barriers.txt document regarding memory barrier ordering guarantees: (*) These guarantees do not apply to bitfields, because compilers often generate code to modify these using non-atomic read-modify-write sequences. Do not attempt to use bitfields to synchronize parallel algorithms. (*) Even in cases where bitfields are protected by locks, all fields in a given bitfield must be protected by one lock. If two fields in a given bitfield are protected by different locks, the compiler's non-atomic read-modify-write sequences can cause an update to one field to corrupt the value of an adjacent field. btrfs_space_info has a bitfield sharing an underlying word consisting of the fields full, chunk_alloc, and flush: struct btrfs_space_info { struct btrfs_fs_info * fs_info; /* 0 8 */ struct btrfs_space_info * parent; /* 8 8 */ ... int clamp; /* 172 4 */ unsigned int full:1; /* 176: 0 4 */ unsigned int chunk_alloc:1; /* 176: 1 4 */ unsigned int flush:1; /* 176: 2 4 */ ... Therefore, to be safe from parallel read-modify-writes losing a write to one of the bitfield members protected by a lock, all writes to all the bitfields must use the lock. They almost universally do, except for btrfs_clear_space_info_full() which iterates over the space_infos and writes out found->full = 0 without a lock. Imagine that we have one thread completing a transaction in which we finished deleting a block_group and are thus calling btrfs_clear_space_info_full() while simultaneously the data reclaim ticket infrastructure is running do_async_reclaim_data_space(): T1 T2 btrfs_commit_transaction btrfs_clear_space_info_full data_sinfo->full = 0 READ: full:0, chunk_alloc:0, flush:1 do_async_reclaim_data_space(data_sinfo) spin_lock(&space_info->lock); if(list_empty(tickets)) space_info->flush = 0; READ: full: 0, chunk_alloc:0, flush:1 MOD/WRITE: full: 0, chunk_alloc:0, flush:0 spin_unlock(&space_info->lock); return; MOD/WRITE: full:0, chunk_alloc:0, flush:1 and now data_sinfo->flush is 1 but the reclaim worker has exited. This breaks the invariant that flush is 0 iff there is no work queued or running. Once this invariant is violated, future allocations that go into __reserve_bytes() will add tickets to space_info->tickets but will see space_info->flush is set to 1 and not queue the work. After this, they will block forever on the resulting ticket, as it is now impossible to kick the worker again. I also confirmed by looking at the assembly of the affected kernel that it is doing RMW operations. For example, to set the flush (3rd) bit to 0, the assembly is: andb $0xfb,0x60(%rbx) and similarly for setting the full (1st) bit to 0: andb $0xfe,-0x20(%rax) So I think this is really a bug on practical systems. I have observed a number of systems in this exact state, but am currently unable to reproduce it. Rather than leaving this footgun lying around for the future, take advantage of the fact that there is room in the struct anyway, and that it is already quite large and simply change the three bitfield members to bools. This avoids writes to space_info->full having any effect on ---truncated---

CVSS Details

CVSS Score
5.5
Severity
MEDIUM
CVSS Vector
CVSS:3.1/AV:L/AC:L/PR:L/UI:N/S:U/C:N/I:N/A:H

Configurations (Affected Products)

cpe:2.3:o:linux:linux_kernel:*:*:*:*:*:*:*:* - VULNERABLE
cpe:2.3:o:linux:linux_kernel:*:*:*:*:*:*:*:* - VULNERABLE
cpe:2.3:o:linux:linux_kernel:*:*:*:*:*:*:*:* - VULNERABLE
cpe:2.3:o:linux:linux_kernel:*:*:*:*:*:*:*:* - VULNERABLE
cpe:2.3:o:linux:linux_kernel:*:*:*:*:*:*:*:* - VULNERABLE
Linux Kernel 5.15.x < 5.15.162
Linux Kernel 5.16.x < 5.16.20
Linux Kernel 5.17.x < 5.17.15
Linux Kernel 5.18.x < 5.18.19
Linux Kernel 5.19.x < 5.19.12
Linux Kernel 6.0.x < 6.0.12
Linux Kernel 6.1.x < 6.1.80
Linux Kernel 6.2.x < 6.2.26
Linux Kernel 6.3.x < 6.3.13
Linux Kernel 6.4.x < 6.4.3
Linux Kernel 6.5.x < 6.5.3
Linux Kernel 6.6.x < 6.6.80
Linux Kernel 6.7.x < 6.7.7
Linux Kernel 6.8.x < 6.8.6
Linux Kernel 6.9.x < 6.9.1
Linux Kernel 6.10.x < 6.10.1

PoC / Exploit Code

⚠ For Security Research Only
The following code is for security research and authorized testing only.
python
// CVE-2025-68358 PoC - Btrfs bitfield race condition // This PoC demonstrates the race condition in btrfs_clear_space_info_full() // Compile: gcc -pthread -o btrfs_race_poc btrfs_race_poc.c #include <stdio.h> #include <stdlib.h> #include <pthread.h> #include <stdint.h> #include <unistd.h> // Simulated btrfs_space_info structure with bitfield typedef struct { void *fs_info; void *parent; int clamp; // Bitfield members - the vulnerable fields unsigned int full:1; // bit 0 unsigned int chunk_alloc:1; // bit 1 unsigned int flush:1; // bit 2 pthread_mutex_t lock; } btrfs_space_info_t; btrfs_space_info_t data_sinfo; volatile int stop = 0; // Thread 1: Simulates btrfs_clear_space_info_full() without lock void* thread_clear_full(void* arg) { printf("[T1] Thread 1 started - simulating btrfs_clear_space_info_full()\n"); while (!stop) { // Race condition: Reading and writing bitfield without lock // This simulates: data_sinfo->full = 0; unsigned int temp = data_sinfo.full | (data_sinfo.chunk_alloc << 1) | (data_sinfo.flush << 2); temp &= ~0x1; // Clear full bit (bit 0) data_sinfo.full = (temp & 0x1); data_sinfo.chunk_alloc = (temp >> 1) & 0x1; data_sinfo.flush = (temp >> 2) & 0x1; usleep(1); // Small delay to increase race window } return NULL; } // Thread 2: Simulates do_async_reclaim_data_space() with lock void* thread_reclaim(void* arg) { printf("[T2] Thread 2 started - simulating do_async_reclaim_data_space()\n"); while (!stop) { pthread_mutex_lock(&data_sinfo.lock); // Simulate: space_info->flush = 0; unsigned int temp = data_sinfo.full | (data_sinfo.chunk_alloc << 1) | (data_sinfo.flush << 2); temp &= ~0x4; // Clear flush bit (bit 2) data_sinfo.full = (temp & 0x1); data_sinfo.chunk_alloc = (temp >> 1) & 0x1; data_sinfo.flush = (temp >> 2) & 0x1; pthread_mutex_unlock(&data_sinfo.lock); usleep(1); } return NULL; } int main() { printf("CVE-2025-68358 PoC - Btrfs Bitfield Race Condition\n"); printf("=================================================\n\n"); // Initialize structure memset(&data_sinfo, 0, sizeof(data_sinfo)); data_sinfo.flush = 1; // Initial state: flush = 1 pthread_mutex_init(&data_sinfo.lock, NULL); pthread_t t1, t2; // Create threads to trigger race condition pthread_create(&t1, NULL, thread_clear_full, NULL); pthread_create(&t2, NULL, thread_reclaim, NULL); // Run for 5 seconds to observe race conditions printf("Running race condition test for 5 seconds...\n"); sleep(5); stop = 1; pthread_join(t1, NULL); pthread_join(t2, NULL); printf("\nFinal state - flush=%d, full=%d, chunk_alloc=%d\n", data_sinfo.flush, data_sinfo.full, data_sinfo.chunk_alloc); printf("\nNote: In real kernel, this race can cause:\n"); printf("1. flush flag incorrectly remains 1 when no work is queued\n"); printf("2. Allocation requests block forever on tickets\n"); printf("3. System deadlock condition\n"); pthread_mutex_destroy(&data_sinfo.lock); return 0; }

References

Raw JSON Data

JSON
{"cve": {"id": "CVE-2025-68358", "sourceIdentifier": "416baaa9-dc9f-4396-8d5f-8c081fb06d67", "published": "2025-12-24T11:15:59.173", "lastModified": "2026-02-26T18:49:42.557", "vulnStatus": "Analyzed", "cveTags": [], "descriptions": [{"lang": "en", "value": "In the Linux kernel, the following vulnerability has been resolved:\n\nbtrfs: fix racy bitfield write in btrfs_clear_space_info_full()\n\nFrom the memory-barriers.txt document regarding memory barrier ordering\nguarantees:\n\n (*) These guarantees do not apply to bitfields, because compilers often\n generate code to modify these using non-atomic read-modify-write\n sequences. Do not attempt to use bitfields to synchronize parallel\n algorithms.\n\n (*) Even in cases where bitfields are protected by locks, all fields\n in a given bitfield must be protected by one lock. If two fields\n in a given bitfield are protected by different locks, the compiler's\n non-atomic read-modify-write sequences can cause an update to one\n field to corrupt the value of an adjacent field.\n\nbtrfs_space_info has a bitfield sharing an underlying word consisting of\nthe fields full, chunk_alloc, and flush:\n\nstruct btrfs_space_info {\n struct btrfs_fs_info * fs_info; /* 0 8 */\n struct btrfs_space_info * parent; /* 8 8 */\n ...\n int clamp; /* 172 4 */\n unsigned int full:1; /* 176: 0 4 */\n unsigned int chunk_alloc:1; /* 176: 1 4 */\n unsigned int flush:1; /* 176: 2 4 */\n ...\n\nTherefore, to be safe from parallel read-modify-writes losing a write to\none of the bitfield members protected by a lock, all writes to all the\nbitfields must use the lock. They almost universally do, except for\nbtrfs_clear_space_info_full() which iterates over the space_infos and\nwrites out found->full = 0 without a lock.\n\nImagine that we have one thread completing a transaction in which we\nfinished deleting a block_group and are thus calling\nbtrfs_clear_space_info_full() while simultaneously the data reclaim\nticket infrastructure is running do_async_reclaim_data_space():\n\n T1 T2\nbtrfs_commit_transaction\n btrfs_clear_space_info_full\n data_sinfo->full = 0\n READ: full:0, chunk_alloc:0, flush:1\n do_async_reclaim_data_space(data_sinfo)\n spin_lock(&space_info->lock);\n if(list_empty(tickets))\n space_info->flush = 0;\n READ: full: 0, chunk_alloc:0, flush:1\n MOD/WRITE: full: 0, chunk_alloc:0, flush:0\n spin_unlock(&space_info->lock);\n return;\n MOD/WRITE: full:0, chunk_alloc:0, flush:1\n\nand now data_sinfo->flush is 1 but the reclaim worker has exited. This\nbreaks the invariant that flush is 0 iff there is no work queued or\nrunning. Once this invariant is violated, future allocations that go\ninto __reserve_bytes() will add tickets to space_info->tickets but will\nsee space_info->flush is set to 1 and not queue the work. After this,\nthey will block forever on the resulting ticket, as it is now impossible\nto kick the worker again.\n\nI also confirmed by looking at the assembly of the affected kernel that\nit is doing RMW operations. For example, to set the flush (3rd) bit to 0,\nthe assembly is:\n andb $0xfb,0x60(%rbx)\nand similarly for setting the full (1st) bit to 0:\n andb $0xfe,-0x20(%rax)\n\nSo I think this is really a bug on practical systems. I have observed\na number of systems in this exact state, but am currently unable to\nreproduce it.\n\nRather than leaving this footgun lying around for the future, take\nadvantage of the fact that there is room in the struct anyway, and that\nit is already quite large and simply change the three bitfield members to\nbools. This avoids writes to space_info->full having any effect on\n---truncated---"}], "metrics": {"cvssMetricV31": [{"source": "[email protected]", "type": "Primary", "cvssData": {"version": "3.1", "vectorString": "CVSS:3.1/AV:L/AC:L/PR:L/UI:N/S:U/C:N/I:N/A:H", "baseScore": 5.5, "baseSeverity": "MEDIUM", "attackVector": "LOCAL", "attackComplexity": "LOW", "privilegesRequired": "LOW", "userInteraction": "NONE", "scope": "UNCHANGED", "confidentialityImpact": "NONE", "integrityImpact": "NONE", "availabilityImpact": "HIGH"}, "exploitabilityScore": 1.8, "impactScore": 3.6}]}, "weaknesses": [{"source": "[email protected]", "type": "Primary", "description": [{"lang": "en", "value": "NVD-CWE-noinfo"}]}], "configurations": [{"nodes": [{"operator": "OR", ... (truncated)