Re: [PATCH] mm/page_alloc: use batch page clearing in kernel_init_pages()

linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed

From: "David Hildenbrand (Arm)" <david@kernel.org>
To: "Salunke, Hrushikesh" <hsalunke@amd.com>,
	Andrew Morton <akpm@linux-foundation.org>
Cc: "Vlastimil Babka (SUSE)" <vbabka@kernel.org>,
	surenb@google.com, mhocko@suse.com, jackmanb@google.com,
	hannes@cmpxchg.org, ziy@nvidia.com, linux-mm@kvack.org,
	linux-kernel@vger.kernel.org, rkodsara@amd.com, bharata@amd.com,
	ankur.a.arora@oracle.com, shivankg@amd.com
Subject: Re: [PATCH] mm/page_alloc: use batch page clearing in kernel_init_pages()
Date: Thu, 9 Apr 2026 11:00:33 +0200	[thread overview]
Message-ID: <4dd26573-85cc-446a-b2b7-2aeab8aa2417@kernel.org> (raw)
In-Reply-To: <fcc68286-d2ae-4e51-b4b2-886af115ad7c@amd.com>

On 4/9/26 10:55, Salunke, Hrushikesh wrote:
> 
> On 08-04-2026 21:02, Andrew Morton wrote:
> 
>> Caution: This message originated from an External Source. Use proper caution when opening attachments, clicking links, or responding.
>>
>>
>> On Wed, 8 Apr 2026 16:14:03 +0530 "Salunke, Hrushikesh" <hsalunke@amd.com> wrote:
>>
>>> kernel_init_pages() runs inside the allocator (post_alloc_hook and
>>> __free_pages_prepare), so it inherits whatever context the caller is in.
>>> Testing with CONFIG_DEBUG_ATOMIC_SLEEP=y and CONFIG_PROVE_LOCKING=y, I
>>> hit this during exit_group() -> exit_mmap() -> __zap_vma_range, where a
>>> page allocation happens while the PTE lock and RCU read lock are held,
>>> making the cond_resched() in the clearing loop illegal:
>>>
>>> [ 1997.353228] BUG: sleeping function called from invalid context at mm/page_alloc.c:1235
>>> [ 1997.353433] in_atomic(): 1, irqs_disabled(): 0, non_block: 0, pid: 19725, name: bash
>>> [ 1997.353572] preempt_count: 1, expected: 0
>>> [ 1997.353706] RCU nest depth: 1, expected: 0
>>> [ 1997.353837] 3 locks held by bash/19725:
>>> [ 1997.353839]  #0: ff38cd415971e540 (&mm->mmap_lock){++++}-{4:4}, at: exit_mmap+0x6e/0x430
>>> [ 1997.353850]  #1: ffffffffb03d6f60 (rcu_read_lock){....}-{1:3}, at: __pte_offset_map+0x2c/0x220
>>> [ 1997.353855]  #2: ff38cd410deb4618 (ptlock_ptr(ptdesc)#2){+.+.}-{3:3}, at: pte_offset_map_lock+0x92/0x170
>>> [ 1997.353868] Call Trace:
>>> [ 1997.353870]  <TASK>
>>> [ 1997.353873]  dump_stack_lvl+0x91/0xb0
>>> [ 1997.353877]  __might_resched+0x15f/0x290
>>> [ 1997.353882]  kernel_init_pages+0x4b/0xa0
>>> [ 1997.353886]  get_page_from_freelist+0x406/0x1e60
>>> [ 1997.353895]  __alloc_frozen_pages_noprof+0x1d8/0x1730
>>> [ 1997.353912]  alloc_pages_mpol+0xa4/0x190
>>> [ 1997.353917]  alloc_pages_noprof+0x59/0xd0
>>> [ 1997.353919]  get_free_pages_noprof+0x11/0x40
>>> [ 1997.353921]  __tlb_remove_folio_pages_size.isra.0+0x7f/0xe0
>>> [ 1997.353923]  __zap_vma_range+0x1bbd/0x1f40
>>> [ 1997.353931]  unmap_vmas+0xd9/0x1d0
>>> [ 1997.353934]  exit_mmap+0x10a/0x430
>>> [ 1997.353943]  __mmput+0x3d/0x130
>>> [ 1997.353947]  do_exit+0x2a7/0xae0
>> tlb_next_batch() is (fortunately) using GFP_NOWAIT.  Perhaps you can
>> alter your patch to not call the cond_resched() if caller is attempting
>> an atomic allocation.
> 
> 
> Thanks Vlastimil, David, Andrew, and Raghu for the reviews.
> 
> After looking into this more, I think adding cond_resched() here was
> overkill. I agree that dropping cond_resched() and 
> PROCESS_PAGES_NON_PREEMPT_BATCH entirely and just calling clear_pages()
> is the right approach. There's no case where cond_resched() in 
> kernel_init_pages() is both necessary and safe:
> 
> - It's unsafe in atomic context, as the BUG shows (tlb_next_batch()
>   allocates under PTE lock + RCU read lock via GFP_NOWAIT).
> - It's unnecessary for common allocations (order-0, mTHP, 2MB) which
>   clear in well under 1ms.
> - For 1 GiB hugepages, kernel_init_pages() only runs during the
>   initial admin-triggered allocation. When processes later fault on
>   those pages, clearing goes through folio_zero_user() ->
>   clear_contig_highpages(), not kernel_init_pages().
> 
> So rather than guarding cond_resched() with GFP flags (as Andrew
> suggested), I'll remove it entirely in v2 to keep things simple and
> same scheduling characteristics as the original code, just with the
> batch clearing performance benefit.
> 
> Regarding the 512 MiB arm64 case that David mentioned the stall from
> clearing that without cond_resched() under PREEMPT_NONE is acceptable,
> or should it be handled differently?

I mean, it would already happen today, because there is no
cond_resched(). So nothing to worry about I guess.

> 
> I can introduce clear_highpages_kasan_tagged() / clear_highpages()
> helpers, or keep v2 minimal with the logic inline in
> kernel_init_pages(). Any preference?

I'd prefer not sprinkling IS_ENABLED(CONFIG_HIGHMEM) around and simply
calling a clear_highpages_kasan_tagged() from kernel_init_pages().

-- 
Cheers,

David

next prev parent reply	other threads:[~2026-04-09  9:00 UTC|newest]

Thread overview: 11+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-04-08  9:24 Hrushikesh Salunke
2026-04-08  9:47 ` Vlastimil Babka (SUSE)
2026-04-08 10:44   ` Salunke, Hrushikesh
2026-04-08 10:53     ` David Hildenbrand (Arm)
2026-04-08 11:16     ` Raghavendra K T
2026-04-08 16:24       ` Raghavendra K T
2026-04-08 15:32     ` Andrew Morton
2026-04-09  8:55       ` Salunke, Hrushikesh
2026-04-09  9:00         ` David Hildenbrand (Arm) [this message]
2026-04-09  9:28           ` Salunke, Hrushikesh
2026-04-08 11:32 ` [syzbot ci] " syzbot ci

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=4dd26573-85cc-446a-b2b7-2aeab8aa2417@kernel.org \
    --to=david@kernel.org \
    --cc=akpm@linux-foundation.org \
    --cc=ankur.a.arora@oracle.com \
    --cc=bharata@amd.com \
    --cc=hannes@cmpxchg.org \
    --cc=hsalunke@amd.com \
    --cc=jackmanb@google.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=mhocko@suse.com \
    --cc=rkodsara@amd.com \
    --cc=shivankg@amd.com \
    --cc=surenb@google.com \
    --cc=vbabka@kernel.org \
    --cc=ziy@nvidia.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox