linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: Lance Yang <lance.yang@linux.dev>
To: Ackerley Tng <ackerleytng@google.com>,
	Deepanshu Kartikey <kartikey406@gmail.com>
Cc: baolin.wang@linux.alibaba.com, lorenzo.stoakes@oracle.com,
	linux-mm@kvack.org, npache@redhat.com,
	linux-kernel@vger.kernel.org, Liam.Howlett@oracle.com,
	syzbot+33a04338019ac7e43a44@syzkaller.appspotmail.com,
	ryan.roberts@arm.com, stable@vger.kernel.org, ziy@nvidia.com,
	dev.jain@arm.com, i@maskray.me, baohua@kernel.org,
	shy828301@gmail.com, akpm@linux-foundation.org, david@kernel.org
Subject: Re: [PATCH v2] mm: thp: deny THP for files on anonymous inodes
Date: Sun, 22 Feb 2026 12:10:06 +0800	[thread overview]
Message-ID: <448363f3-34d6-4d36-b827-9b81023230ec@linux.dev> (raw)
In-Reply-To: <CAEvNRgGLAnZkfPZt32-wyCaefu-tvG9WcX3zq1Xe7fsTabZqmA@mail.gmail.com>



On 2026/2/16 06:48, Ackerley Tng wrote:
> Lance Yang <lance.yang@linux.dev> writes:
> 
>> On 2026/2/14 08:15, Deepanshu Kartikey wrote:
>>> file_thp_enabled() incorrectly allows THP for files on anonymous inodes
>>> (e.g. guest_memfd and secretmem). These files are created via
>>> alloc_file_pseudo(), which does not call get_write_access() and leaves
>>> inode->i_writecount at 0. Combined with S_ISREG(inode->i_mode) being
>>> true, they appear as read-only regular files when
>>> CONFIG_READ_ONLY_THP_FOR_FS is enabled, making them eligible for THP
>>> collapse.
>>>
>>> Anonymous inodes can never pass the inode_is_open_for_write() check
>>> since their i_writecount is never incremented through the normal VFS
>>> open path. The right thing to do is to exclude them from THP eligibility
>>> altogether, since CONFIG_READ_ONLY_THP_FOR_FS was designed for real
>>> filesystem files (e.g. shared libraries), not for pseudo-filesystem
>>> inodes.
>>>
>>> For guest_memfd, this allows khugepaged and MADV_COLLAPSE to create
>>> large folios in the page cache via the collapse path, but the
>>> guest_memfd fault handler does not support large folios. This triggers
>>> WARN_ON_ONCE(folio_test_large(folio)) in kvm_gmem_fault_user_mapping().
>>>
>>> For secretmem, collapse_file() tries to copy page contents through the
>>> direct map, but secretmem pages are removed from the direct map. This
>>> can result in a kernel crash:
>>
>> Good catch, thanks!
>>
>> For secretmem, file_thp_enabled() can incorrectly return true
>> (i_writecount=0, S_ISREG=1), so the mapping becomes eligible for file
>> THP collapse ...
>>
>> However, if any folio is dirty, collapse bails out early with
>> SCAN_PAGE_DIRTY_OR_WRITEBACK, as secretmem doesn't support normal
>> writeback, IIUC.
>>
> 
> Yup! In the reproducers [1] I had to try to avoid setting the dirty flag
> on the pages.
> 
> [1] https://lore.kernel.org/linux-mm/CAEvNRgHegcz3ro35ixkDw39ES8=U6rs6S7iP0gkR9enr7HoGtA@mail.gmail.com
> 
>>>
>>>       BUG: unable to handle page fault for address: ffff88810284d000
>>>       RIP: 0010:memcpy_orig+0x16/0x130
>>>       Call Trace:
>>>        collapse_file
>>>        hpage_collapse_scan_file
>>>        madvise_collapse
>>>
>>> Secretmem is not affected by the crash on upstream as the memory failure
>>> recovery handles the failed copy gracefully, but it still triggers
>>> confusing false memory failure reports:
>>>
>>>       Memory failure: 0x106d96f: recovery action for clean unevictable
>>>       LRU page: Recovered
>>
>> Right. On my setup, that would hit SCAN_COPY_MC in
>> hpage_collapse_scan_file()
>> rather than a hard crash.
>>
> 
> Deepanshu, were you able to trigger a hard crash on some earlier kernel?
> I only saw this false memory failure log.

On a setup where memory failure recovery works, we can trigger a panic by
disabling recovery:

echo 0 > /proc/sys/vm/memory_failure_recovery

Then we would hit the following panic:

[  117.608411] Kernel panic - not syncing: Memory failure on page 1024d6
[  117.609490] CPU: 4 UID: 0 PID: 168 Comm: kworker/4:1 Not tainted 
6.19.0 #83 PREEMPT(full)
[  117.610817] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), 
BIOS 0.5.1 01/01/2011
[  117.612121] Workqueue: events memory_failure_work_func
[  117.612978] Call Trace:
[  117.613401]  <TASK>
[  117.613766]  dump_stack_lvl+0x60/0x90
[  117.614382]  dump_stack+0x14/0x1a
[  117.614940]  vpanic+0x1a6/0x470
[  117.615476]  panic+0xc0/0xc0
[  117.615967]  ? __pfx_panic+0x10/0x10
[  117.616571]  ? update_cfs_rq_load_avg+0x5f/0x5a0
[  117.617336]  ? dequeue_entities+0x250/0x1e30
[  117.618043]  memory_failure.cold+0x2d/0x2d
[  117.618725]  ? __pfx_memory_failure+0x10/0x10
[  117.619451]  ? __raw_spin_lock_irqsave+0x8d/0xf0
[  117.620215]  ? __switch_to+0x3e9/0xb60
[  117.620841]  memory_failure_work_func+0x150/0x200
[  117.621621]  process_one_work+0x63d/0xf50
[  117.622292]  worker_thread+0x517/0xd90
[  117.622915]  ? __pfx_worker_thread+0x10/0x10
[  117.623629]  kthread+0x369/0x460
[  117.624169]  ? __pfx_kthread+0x10/0x10
[  117.624796]  ret_from_fork+0x33a/0x660
[  117.625422]  ? __pfx_ret_from_fork+0x10/0x10
[  117.626126]  ? switch_fpu+0x19/0x1f0
[  117.626728]  ? __switch_to+0x3e9/0xb60
[  117.627354]  ? __pfx_kthread+0x10/0x10
[  117.627978]  ret_from_fork_asm+0x1a/0x30
[  117.628633]  </TASK>
[  117.629316] Kernel Offset: disabled
[  117.629902] ---[ end Kernel panic - not syncing: Memory failure on 
page 1024d6 ]---


  reply	other threads:[~2026-02-22  4:10 UTC|newest]

Thread overview: 11+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-02-14  0:15 Deepanshu Kartikey
2026-02-14 11:27 ` Lance Yang
2026-02-15 22:48   ` Ackerley Tng
2026-02-22  4:10     ` Lance Yang [this message]
2026-02-15 12:41 ` David Hildenbrand (Arm)
2026-02-15 20:29 ` Barry Song
2026-02-16  6:47 ` Ackerley Tng
2026-02-16 15:01 ` Lorenzo Stoakes
2026-02-17  1:44 ` [PATCH] KVM: selftests: Test MADV_COLLAPSE on GUEST_MEMFD Ackerley Tng
2026-02-17 15:15   ` Sean Christopherson
2026-02-20 23:59     ` Ackerley Tng

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=448363f3-34d6-4d36-b827-9b81023230ec@linux.dev \
    --to=lance.yang@linux.dev \
    --cc=Liam.Howlett@oracle.com \
    --cc=ackerleytng@google.com \
    --cc=akpm@linux-foundation.org \
    --cc=baohua@kernel.org \
    --cc=baolin.wang@linux.alibaba.com \
    --cc=david@kernel.org \
    --cc=dev.jain@arm.com \
    --cc=i@maskray.me \
    --cc=kartikey406@gmail.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=lorenzo.stoakes@oracle.com \
    --cc=npache@redhat.com \
    --cc=ryan.roberts@arm.com \
    --cc=shy828301@gmail.com \
    --cc=stable@vger.kernel.org \
    --cc=syzbot+33a04338019ac7e43a44@syzkaller.appspotmail.com \
    --cc=ziy@nvidia.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox