From: Lance Yang <lance.yang@linux.dev>
To: Ackerley Tng <ackerleytng@google.com>,
Deepanshu Kartikey <kartikey406@gmail.com>
Cc: baolin.wang@linux.alibaba.com, lorenzo.stoakes@oracle.com,
linux-mm@kvack.org, npache@redhat.com,
linux-kernel@vger.kernel.org, Liam.Howlett@oracle.com,
syzbot+33a04338019ac7e43a44@syzkaller.appspotmail.com,
ryan.roberts@arm.com, stable@vger.kernel.org, ziy@nvidia.com,
dev.jain@arm.com, i@maskray.me, baohua@kernel.org,
shy828301@gmail.com, akpm@linux-foundation.org, david@kernel.org
Subject: Re: [PATCH v2] mm: thp: deny THP for files on anonymous inodes
Date: Sun, 22 Feb 2026 12:10:06 +0800 [thread overview]
Message-ID: <448363f3-34d6-4d36-b827-9b81023230ec@linux.dev> (raw)
In-Reply-To: <CAEvNRgGLAnZkfPZt32-wyCaefu-tvG9WcX3zq1Xe7fsTabZqmA@mail.gmail.com>
On 2026/2/16 06:48, Ackerley Tng wrote:
> Lance Yang <lance.yang@linux.dev> writes:
>
>> On 2026/2/14 08:15, Deepanshu Kartikey wrote:
>>> file_thp_enabled() incorrectly allows THP for files on anonymous inodes
>>> (e.g. guest_memfd and secretmem). These files are created via
>>> alloc_file_pseudo(), which does not call get_write_access() and leaves
>>> inode->i_writecount at 0. Combined with S_ISREG(inode->i_mode) being
>>> true, they appear as read-only regular files when
>>> CONFIG_READ_ONLY_THP_FOR_FS is enabled, making them eligible for THP
>>> collapse.
>>>
>>> Anonymous inodes can never pass the inode_is_open_for_write() check
>>> since their i_writecount is never incremented through the normal VFS
>>> open path. The right thing to do is to exclude them from THP eligibility
>>> altogether, since CONFIG_READ_ONLY_THP_FOR_FS was designed for real
>>> filesystem files (e.g. shared libraries), not for pseudo-filesystem
>>> inodes.
>>>
>>> For guest_memfd, this allows khugepaged and MADV_COLLAPSE to create
>>> large folios in the page cache via the collapse path, but the
>>> guest_memfd fault handler does not support large folios. This triggers
>>> WARN_ON_ONCE(folio_test_large(folio)) in kvm_gmem_fault_user_mapping().
>>>
>>> For secretmem, collapse_file() tries to copy page contents through the
>>> direct map, but secretmem pages are removed from the direct map. This
>>> can result in a kernel crash:
>>
>> Good catch, thanks!
>>
>> For secretmem, file_thp_enabled() can incorrectly return true
>> (i_writecount=0, S_ISREG=1), so the mapping becomes eligible for file
>> THP collapse ...
>>
>> However, if any folio is dirty, collapse bails out early with
>> SCAN_PAGE_DIRTY_OR_WRITEBACK, as secretmem doesn't support normal
>> writeback, IIUC.
>>
>
> Yup! In the reproducers [1] I had to try to avoid setting the dirty flag
> on the pages.
>
> [1] https://lore.kernel.org/linux-mm/CAEvNRgHegcz3ro35ixkDw39ES8=U6rs6S7iP0gkR9enr7HoGtA@mail.gmail.com
>
>>>
>>> BUG: unable to handle page fault for address: ffff88810284d000
>>> RIP: 0010:memcpy_orig+0x16/0x130
>>> Call Trace:
>>> collapse_file
>>> hpage_collapse_scan_file
>>> madvise_collapse
>>>
>>> Secretmem is not affected by the crash on upstream as the memory failure
>>> recovery handles the failed copy gracefully, but it still triggers
>>> confusing false memory failure reports:
>>>
>>> Memory failure: 0x106d96f: recovery action for clean unevictable
>>> LRU page: Recovered
>>
>> Right. On my setup, that would hit SCAN_COPY_MC in
>> hpage_collapse_scan_file()
>> rather than a hard crash.
>>
>
> Deepanshu, were you able to trigger a hard crash on some earlier kernel?
> I only saw this false memory failure log.
On a setup where memory failure recovery works, we can trigger a panic by
disabling recovery:
echo 0 > /proc/sys/vm/memory_failure_recovery
Then we would hit the following panic:
[ 117.608411] Kernel panic - not syncing: Memory failure on page 1024d6
[ 117.609490] CPU: 4 UID: 0 PID: 168 Comm: kworker/4:1 Not tainted
6.19.0 #83 PREEMPT(full)
[ 117.610817] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996),
BIOS 0.5.1 01/01/2011
[ 117.612121] Workqueue: events memory_failure_work_func
[ 117.612978] Call Trace:
[ 117.613401] <TASK>
[ 117.613766] dump_stack_lvl+0x60/0x90
[ 117.614382] dump_stack+0x14/0x1a
[ 117.614940] vpanic+0x1a6/0x470
[ 117.615476] panic+0xc0/0xc0
[ 117.615967] ? __pfx_panic+0x10/0x10
[ 117.616571] ? update_cfs_rq_load_avg+0x5f/0x5a0
[ 117.617336] ? dequeue_entities+0x250/0x1e30
[ 117.618043] memory_failure.cold+0x2d/0x2d
[ 117.618725] ? __pfx_memory_failure+0x10/0x10
[ 117.619451] ? __raw_spin_lock_irqsave+0x8d/0xf0
[ 117.620215] ? __switch_to+0x3e9/0xb60
[ 117.620841] memory_failure_work_func+0x150/0x200
[ 117.621621] process_one_work+0x63d/0xf50
[ 117.622292] worker_thread+0x517/0xd90
[ 117.622915] ? __pfx_worker_thread+0x10/0x10
[ 117.623629] kthread+0x369/0x460
[ 117.624169] ? __pfx_kthread+0x10/0x10
[ 117.624796] ret_from_fork+0x33a/0x660
[ 117.625422] ? __pfx_ret_from_fork+0x10/0x10
[ 117.626126] ? switch_fpu+0x19/0x1f0
[ 117.626728] ? __switch_to+0x3e9/0xb60
[ 117.627354] ? __pfx_kthread+0x10/0x10
[ 117.627978] ret_from_fork_asm+0x1a/0x30
[ 117.628633] </TASK>
[ 117.629316] Kernel Offset: disabled
[ 117.629902] ---[ end Kernel panic - not syncing: Memory failure on
page 1024d6 ]---
next prev parent reply other threads:[~2026-02-22 4:10 UTC|newest]
Thread overview: 11+ messages / expand[flat|nested] mbox.gz Atom feed top
2026-02-14 0:15 Deepanshu Kartikey
2026-02-14 11:27 ` Lance Yang
2026-02-15 22:48 ` Ackerley Tng
2026-02-22 4:10 ` Lance Yang [this message]
2026-02-15 12:41 ` David Hildenbrand (Arm)
2026-02-15 20:29 ` Barry Song
2026-02-16 6:47 ` Ackerley Tng
2026-02-16 15:01 ` Lorenzo Stoakes
2026-02-17 1:44 ` [PATCH] KVM: selftests: Test MADV_COLLAPSE on GUEST_MEMFD Ackerley Tng
2026-02-17 15:15 ` Sean Christopherson
2026-02-20 23:59 ` Ackerley Tng
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=448363f3-34d6-4d36-b827-9b81023230ec@linux.dev \
--to=lance.yang@linux.dev \
--cc=Liam.Howlett@oracle.com \
--cc=ackerleytng@google.com \
--cc=akpm@linux-foundation.org \
--cc=baohua@kernel.org \
--cc=baolin.wang@linux.alibaba.com \
--cc=david@kernel.org \
--cc=dev.jain@arm.com \
--cc=i@maskray.me \
--cc=kartikey406@gmail.com \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=lorenzo.stoakes@oracle.com \
--cc=npache@redhat.com \
--cc=ryan.roberts@arm.com \
--cc=shy828301@gmail.com \
--cc=stable@vger.kernel.org \
--cc=syzbot+33a04338019ac7e43a44@syzkaller.appspotmail.com \
--cc=ziy@nvidia.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox