Re: [PATCH] mm/vmalloc: request large order pages from buddy allocator

linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed

From: Matthew Wilcox <willy@infradead.org>
To: Andrew Morton <akpm@linux-foundation.org>
Cc: "Vishal Moola (Oracle)" <vishal.moola@gmail.com>,
	linux-mm@kvack.org, linux-kernel@vger.kernel.org,
	Uladzislau Rezki <urezki@gmail.com>
Subject: Re: [PATCH] mm/vmalloc: request large order pages from buddy allocator
Date: Wed, 22 Oct 2025 15:33:42 +0100	[thread overview]
Message-ID: <aPjrRkjiIt6HmXmT@casper.infradead.org> (raw)
In-Reply-To: <20251021142436.323fec204aa6e9d4b674a4aa@linux-foundation.org>

On Tue, Oct 21, 2025 at 02:24:36PM -0700, Andrew Morton wrote:
> On Tue, 21 Oct 2025 12:44:56 -0700 "Vishal Moola (Oracle)" <vishal.moola@gmail.com> wrote:
> 
> > Sometimes, vm_area_alloc_pages() will want many pages from the buddy
> > allocator. Rather than making requests to the buddy allocator for at
> > most 100 pages at a time, we can eagerly request large order pages a
> > smaller number of times.
> 
> Does this have potential to inadvertently reduce the availability of
> hugepages?

Quite the opposite.  Let's say we're doing a 40KiB allocation.  If we
just take the first 10 pages off the PCP list, those could be from
ten different 2MB chunks, preventing ten different hugepages from
forming until the allocation succeeds.  If instead we do an order-3
allocation and an order-1 allocation, those can be from at most two
different 2MB chunks and prevent at most two hugepages from forming.

> > 1000 2mb allocations:
> > 	[Baseline]			[This patch]
> > 	real    46.310s			real    0m34.582
> > 	user    0.001s			user    0.006s
> > 	sys     46.058s			sys     0m34.365s
> > 
> > 10000 200kb allocations:
> > 	[Baseline]			[This patch]
> > 	real    56.104s			real    0m43.696
> > 	user    0.001s			user    0.003s
> > 	sys     55.375s			sys     0m42.995s
> 
> Nice, but how significant is this change likely to be for a real workload?

Ulad has numbers for the last iteration of this patch showing an
improvement for a 16KiB allocation, which is an improvement for fork()
now we all have VMAP_STACK.

> > +	gfp_t large_gfp = (gfp &
> > +		~(__GFP_DIRECT_RECLAIM | __GFP_NOFAIL | __GFP_COMP))
> > +		| __GFP_NOWARN;
> 
> Gee, why is this so complicated?

Because GFP flags suck as an interface?  Look at kmalloc_gfp_adjust().

> > +	unsigned int large_order = ilog2(nr_remaining);
> 
> Should nr_remaining be rounded up to next-power-of-two?

No, we don't want to overallocate, we want to precisely allocate.
To use our 40KiB example from earlier, we want to satisfy the allocation
by allocating a 32KiB chunk and an 8KiB chunk, not by allocating 64KiB
and only using part of it.

(I suppose there's an argument for using alloc_pages_exact() here, but
I think it's a fairly weak one)

next prev parent reply	other threads:[~2025-10-22 14:33 UTC|newest]

Thread overview: 4+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-10-21 19:44 Vishal Moola (Oracle)
2025-10-21 21:24 ` Andrew Morton
2025-10-22 14:33   ` Matthew Wilcox [this message]
2025-10-22 17:50 ` Uladzislau Rezki

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=aPjrRkjiIt6HmXmT@casper.infradead.org \
    --to=willy@infradead.org \
    --cc=akpm@linux-foundation.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=urezki@gmail.com \
    --cc=vishal.moola@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox