[PATCH RFC 0/1] fix for large folio split race in page cache

linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed

* [PATCH RFC 0/1] fix for large folio split race in page cache
@ 2026-03-05 18:34 Chris J Arges
  2026-03-05 18:34 ` [PATCH RFC 1/1] mm/filemap: handle large folio split race in page cache lookups Chris J Arges
  0 siblings, 1 reply; 3+ messages in thread
From: Chris J Arges @ 2026-03-05 18:34 UTC (permalink / raw)
  To: willy, akpm, william.kucharski
  Cc: linux-fsdevel, linux-mm, linux-kernel, kernel-team, Chris J Arges

In production we've seen crashes on 6.18.7+ with the following
signature below. These machines have high memory pressure, were using
xfs file-systems, and generally journalctl was the comm when we oops.

After some crash-dump analysis we determined that this was a race
condition. We tried to create a more self-contained reproducer for this
issue, but unfortunately were unable to do so. This patch will be
applied internally as a mitigation for the issue, but will take time
to validate fully (ensuring we don't see crashes over a longer time). We
are looking for feedback to see if this could be a valid fix or if there
are other approaches that we should look into.

An earlier email I posted with some analysis is here
https://lore.kernel.org/lkml/aYN3JC_Kdgw5G2Ik@861G6M3/T/#u

Thanks,
--chris

Call Trace:
```
aops:xfs_address_space_operations ino:5000126 dentry name(?):"system@d737aaecce5449038a638f9e18bbf5f5-0000000004e06fa7-00064"
flags: 0xeffff8000001ad(locked|waiters|referenced|uptodate|lru|active|node=3|zone=2|lastcpupid=0x1ffff)
raw: 00effff8000001ad ffaa3c6b85b73ec8 ffaa3c6b85b73e08 ff4e378b0e95dea8
raw: 000000000000737a 0000000000000000 00000002ffffffff ff4e379527691b00
page dumped because: VM_BUG_ON_FOLIO(!folio_contains(folio, index))
------------[ cut here ]------------
kernel BUG at mm/filemap.c:3519!
Oops: invalid opcode: 0000 [#1] SMP NOPTI
CPU: 0 UID: 0 PID: 49159 Comm: journalctl Kdump: loaded Tainted: G        W  O        6.18.7-cloudflare-2026.1.15 #1 PREEMPT(voluntary)
Tainted: [W]=WARN, [O]=OOT_MODULE
Hardware name: MiTAC TC55-B8051-G12/S8051GM, BIOS V1.08 09/16/2025
RIP: 0010:filemap_fault+0xa61/0x1410
Code: 48 8b 4c 24 10 4c 8b 44 24 08 48 85 c9 0f 84 82 fa ff ff 49 89 cd e9 bc f9 ff ff 48 c7 c6 20 44 d0 86 4c 89 c7 e8 3f 1c 04 00 <0f> 0b 48 8d 7b 18 4c 89 44 24 08 4c 89 1c 24 e8 0b 97 e3 ff 4c 8b
RSP: 0000:ff6fd043bed0fcb0 EFLAGS: 00010246
RAX: 0000000000000043 RBX: ff4e378b0e95dea8 RCX: 0000000000000000
RDX: 0000000000000000 RSI: 0000000000000001 RDI: ff4e375cef81c4c0
RBP: 000000000000737b R08: 0000000000000000 R09: ff6fd043bed0fb48
R10: ff4e37b4ecc3ffa8 R11: 0000000000000003 R12: 0000000000000000
R13: ff4e375c4fa17680 R14: ff4e378b0e95dd38 R15: ff6fd043bed0fde8
FS:  00007f6c5b8b4980(0000) GS:ff4e375d67864000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00007f6c48b7b050 CR3: 0000005065d34006 CR4: 0000000000771ef0
PKRU: 55555554
Call Trace:
 <TASK>
 ? mod_memcg_state+0x80/0x1c0
 __do_fault+0x31/0xd0
 do_fault+0x2e6/0x710
 __handle_mm_fault+0x7b3/0xe50
 ? srso_alias_return_thunk+0x5/0xfbef5
 ? anon_pipe_write+0x27e/0x670
 handle_mm_fault+0xaa/0x2a0
 do_user_addr_fault+0x208/0x660
 exc_page_fault+0x77/0x170
 asm_exc_page_fault+0x26/0x30
RIP: 0033:0x7f6c5b67c3dc
Code: e2 ff 66 66 2e 0f 1f 84 00 00 00 00 00 90 41 55 41 54 55 53 48 83 ec 18 48 85 ff 0f 84 bd 01 00 00 48 85 f6 0f 84 d4 01 00 00 <48> 8b 5e 08 48 89 cd 48 85 db 74 60 48 83 fb 0f 0f 86 86 00 00 00
RSP: 002b:00007ffe78c072e0 EFLAGS: 00010206
RAX: 0000000000000000 RBX: 000000000737b048 RCX: 000000000737b048
RDX: 0000000000000003 RSI: 00007f6c48b7b048 RDI: 000055bc3b28dee0
RBP: 000055bc3b28dee0 R08: 0000000000000010 R09: 000055bc3b28df18
R10: 0000000000000001 R11: 00007f6c5b679fa0 R12: 0000000000000003
R13: 00007ffe78c07450 R14: 00007ffe78c07450 R15: 00007f6c48b7b048
 </TASK>
```

Chris J Arges (1):
  mm/filemap: handle large folio split race in page cache lookups

 mm/filemap.c | 17 ++++++++---------
 1 file changed, 8 insertions(+), 9 deletions(-)

--
2.43.0



^ permalink raw reply	[flat|nested] 3+ messages in thread

* [PATCH RFC 1/1] mm/filemap: handle large folio split race in page cache lookups
  2026-03-05 18:34 [PATCH RFC 0/1] fix for large folio split race in page cache Chris J Arges
@ 2026-03-05 18:34 ` Chris J Arges
  2026-03-05 19:24   ` Matthew Wilcox
  0 siblings, 1 reply; 3+ messages in thread
From: Chris J Arges @ 2026-03-05 18:34 UTC (permalink / raw)
  To: willy, akpm, william.kucharski
  Cc: linux-fsdevel, linux-mm, linux-kernel, kernel-team, Chris J Arges

We have been hitting VM_BUG_ON_FOLIO(!folio_contains(folio, index)) in
production environments. These machines are using XFS with large folio
support enabled and are under high memory pressure.

From reading the code it seems plausible that folio splits due to memory
reclaim are racing with filemap_fault() serving mmap page faults.

The existing code checks for truncation (folio->mapping != mapping) and
retries, but there does not appear to be equivalent handling for the
split case. The result is:

  kernel BUG at mm/filemap.c:3519!
  VM_BUG_ON_FOLIO(!folio_contains(folio, index), folio)

This RFC patch extends the existing truncation retry checks to also
cover the case where the folio no longer contains the target index.

Fixes: e292e6d644ce ("filemap: Convert filemap_fault to folio")
Signed-off-by: Chris J Arges <carges@cloudflare.com>
---
 mm/filemap.c | 17 ++++++++---------
 1 file changed, 8 insertions(+), 9 deletions(-)

diff --git a/mm/filemap.c b/mm/filemap.c
index 6cd7974d4ada..334d3f700beb 100644
--- a/mm/filemap.c
+++ b/mm/filemap.c
@@ -1954,13 +1954,13 @@ struct folio *__filemap_get_folio_mpol(struct address_space *mapping,
 			folio_lock(folio);
 		}

-		/* Has the page been truncated? */
-		if (unlikely(folio->mapping != mapping)) {
+		/* Has the page been truncated or split? */
+		if (unlikely(folio->mapping != mapping) ||
+		    unlikely(!folio_contains(folio, index))) {
 			folio_unlock(folio);
 			folio_put(folio);
 			goto repeat;
 		}
-		VM_BUG_ON_FOLIO(!folio_contains(folio, index), folio);
 	}

 	if (fgp_flags & FGP_ACCESSED)
@@ -2179,10 +2179,9 @@ unsigned find_lock_entries(struct address_space *mapping, pgoff_t *start,
 			if (!folio_trylock(folio))
 				goto put;
 			if (folio->mapping != mapping ||
-			    folio_test_writeback(folio))
+			    folio_test_writeback(folio) ||
+			    !folio_contains(folio, xas.xa_index))
 				goto unlock;
-			VM_BUG_ON_FOLIO(!folio_contains(folio, xas.xa_index),
-					folio);
 		} else {
 			nr = 1 << xas_get_order(&xas);
 			base = xas.xa_index & ~(nr - 1);
@@ -3570,13 +3569,13 @@ vm_fault_t filemap_fault(struct vm_fault *vmf)
 	if (!lock_folio_maybe_drop_mmap(vmf, folio, &fpin))
 		goto out_retry;

-	/* Did it get truncated? */
-	if (unlikely(folio->mapping != mapping)) {
+	/* Did it get truncated or split? */
+	if (unlikely(folio->mapping != mapping) ||
+	    unlikely(!folio_contains(folio, index))) {
 		folio_unlock(folio);
 		folio_put(folio);
 		goto retry_find;
 	}
-	VM_BUG_ON_FOLIO(!folio_contains(folio, index), folio);

 	/*
 	 * We have a locked folio in the page cache, now we need to check
--
2.43.0



^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: [PATCH RFC 1/1] mm/filemap: handle large folio split race in page cache lookups
  2026-03-05 18:34 ` [PATCH RFC 1/1] mm/filemap: handle large folio split race in page cache lookups Chris J Arges
@ 2026-03-05 19:24   ` Matthew Wilcox
  0 siblings, 0 replies; 3+ messages in thread
From: Matthew Wilcox @ 2026-03-05 19:24 UTC (permalink / raw)
  To: Chris J Arges
  Cc: akpm, william.kucharski, linux-fsdevel, linux-mm, linux-kernel,
	kernel-team

On Thu, Mar 05, 2026 at 12:34:33PM -0600, Chris J Arges wrote:
> We have been hitting VM_BUG_ON_FOLIO(!folio_contains(folio, index)) in
> production environments. These machines are using XFS with large folio
> support enabled and are under high memory pressure.
> 
> >From reading the code it seems plausible that folio splits due to memory
> reclaim are racing with filemap_fault() serving mmap page faults.
> 
> The existing code checks for truncation (folio->mapping != mapping) and
> retries, but there does not appear to be equivalent handling for the
> split case. The result is:
> 
>   kernel BUG at mm/filemap.c:3519!
>   VM_BUG_ON_FOLIO(!folio_contains(folio, index), folio)

This didn't occur to me as a possibility because filemap_get_entry()
is _supposed_ to take care of it.  But if this patch fixes it, then
we need to understand why it works.

folio_split() needs to be sure that it's the only one holding a reference
to the folio.  To that end, it calculates the expected refcount of the
folio, and freezes it (sets the refcount to 0 if the refcount is the
expected value).  Once filemap_get_entry() has incremented the refcount,
freezing will fail.

But of course, we can race.  filemap_get_entry() can load a folio first,
the entire folio_split can happen, then it calls folio_try_get() and
succeeds, but it no longer covers the index we were looking for.  That's
what the xas_reload() is trying to prevent -- if the index is for a
folio which has changed, then the xas_reload() should come back with a
different folio and we goto repeat.

So how did we get through this with a reference to the wrong folio?

^ permalink raw reply	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2026-03-05 19:24 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2026-03-05 18:34 [PATCH RFC 0/1] fix for large folio split race in page cache Chris J Arges
2026-03-05 18:34 ` [PATCH RFC 1/1] mm/filemap: handle large folio split race in page cache lookups Chris J Arges
2026-03-05 19:24   ` Matthew Wilcox

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox