From: Michal Hocko <mhocko@suse.com>
To: Mikulas Patocka <mpatocka@redhat.com>
Cc: "Uladzislau Rezki (Sony)" <urezki@gmail.com>,
linux-mm@kvack.org, Andrew Morton <akpm@linux-foundation.org>,
Vishal Moola <vishal.moola@gmail.com>,
Baoquan He <bhe@redhat.com>, LKML <linux-kernel@vger.kernel.org>
Subject: Re: [PATCH] vmalloc: support __GFP_RETRY_MAYFAIL and __GFP_NORETRY
Date: Mon, 2 Mar 2026 19:51:25 +0100 [thread overview]
Message-ID: <aaXcLRdhq0IyoNIG@tiehlicka> (raw)
In-Reply-To: <a8010888-ce49-9232-f132-338de8e18cee@redhat.com>
On Mon 02-03-26 18:38:53, Mikulas Patocka wrote:
>
>
> On Mon, 2 Mar 2026, Uladzislau Rezki (Sony) wrote:
>
> > From: Michal Hocko <mhocko@suse.com>
> >
> > __GFP_RETRY_MAYFAIL and __GFP_NORETRY haven't been supported so far
> > because their semantic (i.e. to not trigger OOM killer) is not possible
> > with the existing vmalloc page table allocation which is allowing for
> > the OOM killer.
> >
> > Example: __vmalloc(size, GFP_KERNEL | __GFP_RETRY_MAYFAIL);
> >
> > <snip>
> > vmalloc_test/55 invoked oom-killer:
> > gfp_mask=0x40dc0(
> > GFP_KERNEL|__GFP_ZERO|__GFP_COMP), order=0, oom_score_adj=0
> > active_anon:0 inactive_anon:0 isolated_anon:0
> > active_file:0 inactive_file:0 isolated_file:0
> > unevictable:0 dirty:0 writeback:0
> > slab_reclaimable:700 slab_unreclaimable:33708
> > mapped:0 shmem:0 pagetables:5174
> > sec_pagetables:0 bounce:0
> > kernel_misc_reclaimable:0
> > free:850 free_pcp:319 free_cma:0
> > CPU: 4 UID: 0 PID: 639 Comm: vmalloc_test/55 ...
> > Hardware name: QEMU Standard PC (i440FX + PIIX, ...
> > Call Trace:
> > <TASK>
> > dump_stack_lvl+0x5d/0x80
> > dump_header+0x43/0x1b3
> > out_of_memory.cold+0x8/0x78
> > __alloc_pages_slowpath.constprop.0+0xef5/0x1130
> > __alloc_frozen_pages_noprof+0x312/0x330
> > alloc_pages_mpol+0x7d/0x160
> > alloc_pages_noprof+0x50/0xa0
> > __pte_alloc_kernel+0x1e/0x1f0
> > ...
> > <snip>
> >
> > There are usecases for these modifiers when a large allocation request
> > should rather fail than trigger OOM killer which wouldn't be able to
> > handle the situation anyway [1].
> >
> > While we cannot change existing page table allocation code easily we can
> > piggy back on scoped NOWAIT allocation for them that we already have in
> > place. The rationale is that the bulk of the consumed memory is sitting
> > in pages backing the vmalloc allocation. Page tables are only
> > participating a tiny fraction. Moreover page tables for virtually allocated
> > areas are never reclaimed so the longer the system runs to less likely
> > they are. It makes sense to allow an approximation of __GFP_RETRY_MAYFAIL
> > and __GFP_NORETRY even if the page table allocation part is much weaker.
> > This doesn't break the failure mode while it allows for the no OOM
> > semantic.
> >
> > [1] https://lore.kernel.org/all/32bd9bed-a939-69c4-696d-f7f9a5fe31d8@redhat.com/T/#u
> >
> > Tested-by: Uladzislau Rezki (Sony) <urezki@gmail.com>
> > Signed-off-by: Michal Hocko <mhocko@suse.com>
> > Signed-off-by: Uladzislau Rezki (Sony) <urezki@gmail.com>
> > ---
> > mm/vmalloc.c | 17 ++++++++++++-----
> > 1 file changed, 12 insertions(+), 5 deletions(-)
> >
> > diff --git a/mm/vmalloc.c b/mm/vmalloc.c
> > index a06f4b3ea367..975592b0ec89 100644
> > --- a/mm/vmalloc.c
> > +++ b/mm/vmalloc.c
> > @@ -3798,6 +3798,8 @@ static void defer_vm_area_cleanup(struct vm_struct *area)
> > * non-blocking (no __GFP_DIRECT_RECLAIM) - memalloc_noreclaim_save()
> > * GFP_NOFS - memalloc_nofs_save()
> > * GFP_NOIO - memalloc_noio_save()
> > + * __GFP_RETRY_MAYFAIL, __GFP_NORETRY - memalloc_noreclaim_save()
> > + * to prevent OOMs
> > *
> > * Returns a flag cookie to pair with restore.
> > */
> > @@ -3806,7 +3808,8 @@ memalloc_apply_gfp_scope(gfp_t gfp_mask)
> > {
> > unsigned int flags = 0;
> >
> > - if (!gfpflags_allow_blocking(gfp_mask))
> > + if (!gfpflags_allow_blocking(gfp_mask) ||
> > + (gfp_mask & (__GFP_RETRY_MAYFAIL | __GFP_NORETRY)))
> > flags = memalloc_noreclaim_save();
>
> I wouldn't do this because:
>
> 1. it makes the __GFP_RETRY_MAYFAIL allocations unreliable.
__GFP_RETRY_MAYFAIL doesn't provide any reliability. It just promisses
to not OOM while trying hard. I believe this implementation doesn't
break that promise.
> 2. The comment at memalloc_noreclaim_save says that it may deplete memory
> reserves: "This should only be used when the caller guarantees the
> allocation will allow more memory to be freed very shortly, i.e. it needs
> to allocate some memory in the process of freeing memory, and cannot
> reclaim due to potential recursion."
yes, this allocation clearly doesn't guaratee to free more memory. That
comment is rather dated. Anyway, the crux is to make sure that the
allocation is not unbound. The idea behind this decision is that the
page tables are only a tiny fraction of the resulting memory allocated.
Moreover this virtually allocated space is recycled so over time there
should be less and less of page tables allocated as well.
> I think that the cleanest solution to this problem would be to get rid of
> PF_MEMALLOC_NOFS and PF_MEMALLOC_NOIO and instead introduce two per-thread
> variables "gfp_t set_flags" and "gfp_t clear_flags" and set and clear gfp
> flags according to them in the allocator: "gfp = (gfp |
> current->set_flags) & ~current->clear_flags";
We've been through discussions like this one way too many times and the
conclusion is that, no this will not work. The gfp space we have and
need to support without rewriting a large part of the kernel is simply
incompatible with a more sane interface. Yeah, I hate that as well but
here we are. We need to be creative to keep sensible and not introduce
even more weirdness to the interface.
--
Michal Hocko
SUSE Labs
next prev parent reply other threads:[~2026-03-02 18:51 UTC|newest]
Thread overview: 6+ messages / expand[flat|nested] mbox.gz Atom feed top
2026-03-02 11:47 [PATCH] mm/vmalloc: Fix incorrect size reporting on allocation failure Uladzislau Rezki (Sony)
2026-03-02 11:47 ` [PATCH] vmalloc: support __GFP_RETRY_MAYFAIL and __GFP_NORETRY Uladzislau Rezki (Sony)
2026-03-02 17:38 ` Mikulas Patocka
2026-03-02 18:51 ` Michal Hocko [this message]
2026-03-02 14:52 ` [PATCH] mm/vmalloc: Fix incorrect size reporting on allocation failure Dev Jain
2026-03-02 17:41 ` Mikulas Patocka
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=aaXcLRdhq0IyoNIG@tiehlicka \
--to=mhocko@suse.com \
--cc=akpm@linux-foundation.org \
--cc=bhe@redhat.com \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=mpatocka@redhat.com \
--cc=urezki@gmail.com \
--cc=vishal.moola@gmail.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox