From: "Salunke, Hrushikesh" <hsalunke@amd.com>
To: "David Hildenbrand (Arm)" <david@kernel.org>,
Andrew Morton <akpm@linux-foundation.org>
Cc: "Vlastimil Babka (SUSE)" <vbabka@kernel.org>, <surenb@google.com>,
<mhocko@suse.com>, <jackmanb@google.com>, <hannes@cmpxchg.org>,
<ziy@nvidia.com>, <linux-mm@kvack.org>,
<linux-kernel@vger.kernel.org>, <rkodsara@amd.com>,
<bharata@amd.com>, <ankur.a.arora@oracle.com>, <shivankg@amd.com>,
<hsalunke@amd.com>
Subject: Re: [PATCH] mm/page_alloc: use batch page clearing in kernel_init_pages()
Date: Thu, 9 Apr 2026 14:58:29 +0530 [thread overview]
Message-ID: <5c01d4ba-4453-47ac-9904-6ad6dbd69c2c@amd.com> (raw)
In-Reply-To: <4dd26573-85cc-446a-b2b7-2aeab8aa2417@kernel.org>
On 09-04-2026 14:30, David Hildenbrand (Arm) wrote:
> Caution: This message originated from an External Source. Use proper caution when opening attachments, clicking links, or responding.
>
>
> On 4/9/26 10:55, Salunke, Hrushikesh wrote:
>> On 08-04-2026 21:02, Andrew Morton wrote:
>>
>>> Caution: This message originated from an External Source. Use proper caution when opening attachments, clicking links, or responding.
>>>
>>>
>>> On Wed, 8 Apr 2026 16:14:03 +0530 "Salunke, Hrushikesh" <hsalunke@amd.com> wrote:
>>>
>>>> kernel_init_pages() runs inside the allocator (post_alloc_hook and
>>>> __free_pages_prepare), so it inherits whatever context the caller is in.
>>>> Testing with CONFIG_DEBUG_ATOMIC_SLEEP=y and CONFIG_PROVE_LOCKING=y, I
>>>> hit this during exit_group() -> exit_mmap() -> __zap_vma_range, where a
>>>> page allocation happens while the PTE lock and RCU read lock are held,
>>>> making the cond_resched() in the clearing loop illegal:
>>>>
>>>> [ 1997.353228] BUG: sleeping function called from invalid context at mm/page_alloc.c:1235
>>>> [ 1997.353433] in_atomic(): 1, irqs_disabled(): 0, non_block: 0, pid: 19725, name: bash
>>>> [ 1997.353572] preempt_count: 1, expected: 0
>>>> [ 1997.353706] RCU nest depth: 1, expected: 0
>>>> [ 1997.353837] 3 locks held by bash/19725:
>>>> [ 1997.353839] #0: ff38cd415971e540 (&mm->mmap_lock){++++}-{4:4}, at: exit_mmap+0x6e/0x430
>>>> [ 1997.353850] #1: ffffffffb03d6f60 (rcu_read_lock){....}-{1:3}, at: __pte_offset_map+0x2c/0x220
>>>> [ 1997.353855] #2: ff38cd410deb4618 (ptlock_ptr(ptdesc)#2){+.+.}-{3:3}, at: pte_offset_map_lock+0x92/0x170
>>>> [ 1997.353868] Call Trace:
>>>> [ 1997.353870] <TASK>
>>>> [ 1997.353873] dump_stack_lvl+0x91/0xb0
>>>> [ 1997.353877] __might_resched+0x15f/0x290
>>>> [ 1997.353882] kernel_init_pages+0x4b/0xa0
>>>> [ 1997.353886] get_page_from_freelist+0x406/0x1e60
>>>> [ 1997.353895] __alloc_frozen_pages_noprof+0x1d8/0x1730
>>>> [ 1997.353912] alloc_pages_mpol+0xa4/0x190
>>>> [ 1997.353917] alloc_pages_noprof+0x59/0xd0
>>>> [ 1997.353919] get_free_pages_noprof+0x11/0x40
>>>> [ 1997.353921] __tlb_remove_folio_pages_size.isra.0+0x7f/0xe0
>>>> [ 1997.353923] __zap_vma_range+0x1bbd/0x1f40
>>>> [ 1997.353931] unmap_vmas+0xd9/0x1d0
>>>> [ 1997.353934] exit_mmap+0x10a/0x430
>>>> [ 1997.353943] __mmput+0x3d/0x130
>>>> [ 1997.353947] do_exit+0x2a7/0xae0
>>> tlb_next_batch() is (fortunately) using GFP_NOWAIT. Perhaps you can
>>> alter your patch to not call the cond_resched() if caller is attempting
>>> an atomic allocation.
>>
>> Thanks Vlastimil, David, Andrew, and Raghu for the reviews.
>>
>> After looking into this more, I think adding cond_resched() here was
>> overkill. I agree that dropping cond_resched() and
>> PROCESS_PAGES_NON_PREEMPT_BATCH entirely and just calling clear_pages()
>> is the right approach. There's no case where cond_resched() in
>> kernel_init_pages() is both necessary and safe:
>>
>> - It's unsafe in atomic context, as the BUG shows (tlb_next_batch()
>> allocates under PTE lock + RCU read lock via GFP_NOWAIT).
>> - It's unnecessary for common allocations (order-0, mTHP, 2MB) which
>> clear in well under 1ms.
>> - For 1 GiB hugepages, kernel_init_pages() only runs during the
>> initial admin-triggered allocation. When processes later fault on
>> those pages, clearing goes through folio_zero_user() ->
>> clear_contig_highpages(), not kernel_init_pages().
>>
>> So rather than guarding cond_resched() with GFP flags (as Andrew
>> suggested), I'll remove it entirely in v2 to keep things simple and
>> same scheduling characteristics as the original code, just with the
>> batch clearing performance benefit.
>>
>> Regarding the 512 MiB arm64 case that David mentioned the stall from
>> clearing that without cond_resched() under PREEMPT_NONE is acceptable,
>> or should it be handled differently?
Thanks David.
> I mean, it would already happen today, because there is no
> cond_resched(). So nothing to worry about I guess.
Right makes sense.
>
>> I can introduce clear_highpages_kasan_tagged() / clear_highpages()
>> helpers, or keep v2 minimal with the logic inline in
>> kernel_init_pages(). Any preference?
> I'd prefer not sprinkling IS_ENABLED(CONFIG_HIGHMEM) around and simply
> calling a clear_highpages_kasan_tagged() from kernel_init_pages().
>
Sounds good. I'll add a clear_highpages_kasan_tagged() helper in
highmem.h and have kernel_init_pages() call it directly. Will send
v2 shortly.
Regards,
Hrushikesh
next prev parent reply other threads:[~2026-04-09 9:28 UTC|newest]
Thread overview: 11+ messages / expand[flat|nested] mbox.gz Atom feed top
2026-04-08 9:24 Hrushikesh Salunke
2026-04-08 9:47 ` Vlastimil Babka (SUSE)
2026-04-08 10:44 ` Salunke, Hrushikesh
2026-04-08 10:53 ` David Hildenbrand (Arm)
2026-04-08 11:16 ` Raghavendra K T
2026-04-08 16:24 ` Raghavendra K T
2026-04-08 15:32 ` Andrew Morton
2026-04-09 8:55 ` Salunke, Hrushikesh
2026-04-09 9:00 ` David Hildenbrand (Arm)
2026-04-09 9:28 ` Salunke, Hrushikesh [this message]
2026-04-08 11:32 ` [syzbot ci] " syzbot ci
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=5c01d4ba-4453-47ac-9904-6ad6dbd69c2c@amd.com \
--to=hsalunke@amd.com \
--cc=akpm@linux-foundation.org \
--cc=ankur.a.arora@oracle.com \
--cc=bharata@amd.com \
--cc=david@kernel.org \
--cc=hannes@cmpxchg.org \
--cc=jackmanb@google.com \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=mhocko@suse.com \
--cc=rkodsara@amd.com \
--cc=shivankg@amd.com \
--cc=surenb@google.com \
--cc=vbabka@kernel.org \
--cc=ziy@nvidia.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox