Re: [PATCH] mm/page_alloc: use batch page clearing in kernel_init_pages()

linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed

From: "Salunke, Hrushikesh" <hsalunke@amd.com>
To: "David Hildenbrand (Arm)" <david@kernel.org>,
	Andrew Morton <akpm@linux-foundation.org>
Cc: "Vlastimil Babka (SUSE)" <vbabka@kernel.org>, <surenb@google.com>,
	<mhocko@suse.com>, <jackmanb@google.com>, <hannes@cmpxchg.org>,
	<ziy@nvidia.com>, <linux-mm@kvack.org>,
	<linux-kernel@vger.kernel.org>, <rkodsara@amd.com>,
	<bharata@amd.com>, <ankur.a.arora@oracle.com>, <shivankg@amd.com>,
	<hsalunke@amd.com>
Subject: Re: [PATCH] mm/page_alloc: use batch page clearing in kernel_init_pages()
Date: Thu, 9 Apr 2026 14:58:29 +0530	[thread overview]
Message-ID: <5c01d4ba-4453-47ac-9904-6ad6dbd69c2c@amd.com> (raw)
In-Reply-To: <4dd26573-85cc-446a-b2b7-2aeab8aa2417@kernel.org>


On 09-04-2026 14:30, David Hildenbrand (Arm) wrote:

> Caution: This message originated from an External Source. Use proper caution when opening attachments, clicking links, or responding.
>
>
> On 4/9/26 10:55, Salunke, Hrushikesh wrote:
>> On 08-04-2026 21:02, Andrew Morton wrote:
>>
>>> Caution: This message originated from an External Source. Use proper caution when opening attachments, clicking links, or responding.
>>>
>>>
>>> On Wed, 8 Apr 2026 16:14:03 +0530 "Salunke, Hrushikesh" <hsalunke@amd.com> wrote:
>>>
>>>> kernel_init_pages() runs inside the allocator (post_alloc_hook and
>>>> __free_pages_prepare), so it inherits whatever context the caller is in.
>>>> Testing with CONFIG_DEBUG_ATOMIC_SLEEP=y and CONFIG_PROVE_LOCKING=y, I
>>>> hit this during exit_group() -> exit_mmap() -> __zap_vma_range, where a
>>>> page allocation happens while the PTE lock and RCU read lock are held,
>>>> making the cond_resched() in the clearing loop illegal:
>>>>
>>>> [ 1997.353228] BUG: sleeping function called from invalid context at mm/page_alloc.c:1235
>>>> [ 1997.353433] in_atomic(): 1, irqs_disabled(): 0, non_block: 0, pid: 19725, name: bash
>>>> [ 1997.353572] preempt_count: 1, expected: 0
>>>> [ 1997.353706] RCU nest depth: 1, expected: 0
>>>> [ 1997.353837] 3 locks held by bash/19725:
>>>> [ 1997.353839]  #0: ff38cd415971e540 (&mm->mmap_lock){++++}-{4:4}, at: exit_mmap+0x6e/0x430
>>>> [ 1997.353850]  #1: ffffffffb03d6f60 (rcu_read_lock){....}-{1:3}, at: __pte_offset_map+0x2c/0x220
>>>> [ 1997.353855]  #2: ff38cd410deb4618 (ptlock_ptr(ptdesc)#2){+.+.}-{3:3}, at: pte_offset_map_lock+0x92/0x170
>>>> [ 1997.353868] Call Trace:
>>>> [ 1997.353870]  <TASK>
>>>> [ 1997.353873]  dump_stack_lvl+0x91/0xb0
>>>> [ 1997.353877]  __might_resched+0x15f/0x290
>>>> [ 1997.353882]  kernel_init_pages+0x4b/0xa0
>>>> [ 1997.353886]  get_page_from_freelist+0x406/0x1e60
>>>> [ 1997.353895]  __alloc_frozen_pages_noprof+0x1d8/0x1730
>>>> [ 1997.353912]  alloc_pages_mpol+0xa4/0x190
>>>> [ 1997.353917]  alloc_pages_noprof+0x59/0xd0
>>>> [ 1997.353919]  get_free_pages_noprof+0x11/0x40
>>>> [ 1997.353921]  __tlb_remove_folio_pages_size.isra.0+0x7f/0xe0
>>>> [ 1997.353923]  __zap_vma_range+0x1bbd/0x1f40
>>>> [ 1997.353931]  unmap_vmas+0xd9/0x1d0
>>>> [ 1997.353934]  exit_mmap+0x10a/0x430
>>>> [ 1997.353943]  __mmput+0x3d/0x130
>>>> [ 1997.353947]  do_exit+0x2a7/0xae0
>>> tlb_next_batch() is (fortunately) using GFP_NOWAIT.  Perhaps you can
>>> alter your patch to not call the cond_resched() if caller is attempting
>>> an atomic allocation.
>>
>> Thanks Vlastimil, David, Andrew, and Raghu for the reviews.
>>
>> After looking into this more, I think adding cond_resched() here was
>> overkill. I agree that dropping cond_resched() and
>> PROCESS_PAGES_NON_PREEMPT_BATCH entirely and just calling clear_pages()
>> is the right approach. There's no case where cond_resched() in
>> kernel_init_pages() is both necessary and safe:
>>
>> - It's unsafe in atomic context, as the BUG shows (tlb_next_batch()
>>   allocates under PTE lock + RCU read lock via GFP_NOWAIT).
>> - It's unnecessary for common allocations (order-0, mTHP, 2MB) which
>>   clear in well under 1ms.
>> - For 1 GiB hugepages, kernel_init_pages() only runs during the
>>   initial admin-triggered allocation. When processes later fault on
>>   those pages, clearing goes through folio_zero_user() ->
>>   clear_contig_highpages(), not kernel_init_pages().
>>
>> So rather than guarding cond_resched() with GFP flags (as Andrew
>> suggested), I'll remove it entirely in v2 to keep things simple and
>> same scheduling characteristics as the original code, just with the
>> batch clearing performance benefit.
>>
>> Regarding the 512 MiB arm64 case that David mentioned the stall from
>> clearing that without cond_resched() under PREEMPT_NONE is acceptable,
>> or should it be handled differently?

Thanks David.

> I mean, it would already happen today, because there is no
> cond_resched(). So nothing to worry about I guess.

Right makes sense.

>
>> I can introduce clear_highpages_kasan_tagged() / clear_highpages()
>> helpers, or keep v2 minimal with the logic inline in
>> kernel_init_pages(). Any preference?
> I'd prefer not sprinkling IS_ENABLED(CONFIG_HIGHMEM) around and simply
> calling a clear_highpages_kasan_tagged() from kernel_init_pages().
>
Sounds good. I'll add a clear_highpages_kasan_tagged() helper in
highmem.h and have kernel_init_pages() call it directly. Will send
v2 shortly.

Regards,
Hrushikesh

next prev parent reply	other threads:[~2026-04-09  9:28 UTC|newest]

Thread overview: 11+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-04-08  9:24 Hrushikesh Salunke
2026-04-08  9:47 ` Vlastimil Babka (SUSE)
2026-04-08 10:44   ` Salunke, Hrushikesh
2026-04-08 10:53     ` David Hildenbrand (Arm)
2026-04-08 11:16     ` Raghavendra K T
2026-04-08 16:24       ` Raghavendra K T
2026-04-08 15:32     ` Andrew Morton
2026-04-09  8:55       ` Salunke, Hrushikesh
2026-04-09  9:00         ` David Hildenbrand (Arm)
2026-04-09  9:28           ` Salunke, Hrushikesh [this message]
2026-04-08 11:32 ` [syzbot ci] " syzbot ci

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=5c01d4ba-4453-47ac-9904-6ad6dbd69c2c@amd.com \
    --to=hsalunke@amd.com \
    --cc=akpm@linux-foundation.org \
    --cc=ankur.a.arora@oracle.com \
    --cc=bharata@amd.com \
    --cc=david@kernel.org \
    --cc=hannes@cmpxchg.org \
    --cc=jackmanb@google.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=mhocko@suse.com \
    --cc=rkodsara@amd.com \
    --cc=shivankg@amd.com \
    --cc=surenb@google.com \
    --cc=vbabka@kernel.org \
    --cc=ziy@nvidia.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox