Re: [PATCH] mm: limit THP alignment – performance gain observed in AI inference workloads

linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed

From: Zi Yan <ziy@nvidia.com>
To: Dev Jain <dev.jain@arm.com>
Cc: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>,
	siddhartha@kenip.in, linux-mm@kvack.org,
	linux-kernel@vger.kernel.org, mgorman@suse.de
Subject: Re: [PATCH] mm: limit THP alignment – performance gain observed in AI inference workloads
Date: Tue, 01 Jul 2025 14:49:01 -0400	[thread overview]
Message-ID: <5D015E99-474A-4D98-8C43-488A46BEB2F5@nvidia.com> (raw)
In-Reply-To: <6eaaa2e4-9067-47bc-8dd4-d8ef56c26b3b@arm.com>

On 1 Jul 2025, at 12:20, Dev Jain wrote:

> On 01/07/25 6:09 pm, Lorenzo Stoakes wrote:
>> On Tue, Jul 01, 2025 at 05:45:51PM +0530, siddhartha@kenip.in wrote:
>>> 🧩 1. Does the patch cause VMAs to be merged eventually?
>>> You're correct: VMA merging only happens at mmap() time (via
>>> __mmap_region()). What the patch affects is the behavior of
>>> thp_get_unmapped_area_vmflags() before the mmap is placed.
>> [...]
>>
>>> 📐 2. Why aren’t the VMAs mergeable before the patch?
>>> Great question. Even if the VMA flags are identical, gaps introduced by
>>> forced alignment from get_unmapped_area() break the precondition for
>>> merging:
>> [...]
>>
>>> 💡 4. Why this patch complements Rik’s rather than contradicts it:
>> I'm really perplexed as to why you felt the need to (seemingly via LLM)
>> reply with the explanation I've already provided here?...
>>
>> There's errors in things you say here too.
>>
>> With respect, please don't do this.
>>
>> (I'm the co-maintainer of pretty much all the relevant code here and wrote
>> the VMA merge logic you're referring to.)
>>
>>> 🤖 3. How does this impact AI workloads like Hugging Face Transformers?
>>> Tokenization and dynamic batching create non-deterministic memory allocation
>>> patterns:
>>>
>>> Models like BERT and T5 dynamically allocate intermediate buffers per
>>> token-length, batch size, and attention window.
>>>
>>> Hugging Face + ONNX Runtime uses multiple small-ish anonymous mmap()s, often
>>> 512KB–1.8MB.
>>>
>>> These allocations come in bursts — but due to forced alignment, the kernel
>>> was placing them with artificial gaps, defeating THP eligibility entirely.
>>>
>>> By not force-aligning non-PMD-sized mappings, we avoid injecting gaps. The
>>> result is that:
>>>
>>> a. VMAs remain adjacent → mergeable
>>>
>>> b. Physical memory is contiguous → eligible for khugepaged collapse
>>>
>>> c. THP utilization increases → fewer TLB misses → lower latency → higher
>>> throughput
>>>
>> This is very useful information and it's appreciated! Let's not drown this
>> out with restatements of stuff already covered.
>>
>>> ⚙️ 5. mTHP note
>>> Although this patch doesn’t target mTHP directly, I believe a similar logic
>>> tweak could apply there too — especially with shmem-backed workloads (common
>>> in model servers using shared tensor memory). I’d be happy to help test any
>>> changes proposed there to derive the consequent results.
>> Dev - could we hold off on any effort to do something like this until I've
>> had a chance to refactor THP somewhat? This is already a mess and I'd like
>> to avoid us piling on more complexity.
>>
>> We can revisit this at a later stage.
>
> Yes of course. I had run a small benchmark on a quick dumb patch I wrote and I
> don't see any measurable perf improvement, probably because the highest THP order
> getting chosen is always PMD size.

I think mTHP is much more complicated, since mTHP has many sizes.
Trying to adjust VMA alignments to get mTHP might not work well, since
you never know what sizes new VMAs are going to have.

IMHO, it might be better to align VMA to PMD or the largest mTHP size
(for example, on ARM64 with 64KB base page, PMD THP is 512MB, a 2MB
mTHP sounds more reasonable there) if possible and enable
VMA merging as much as possible for future huge page collapse.
mTHP can be used to fill the non faulted holes in VMAs if necessary.

>
> Out of curiosity, where do you plan to do the refactoring?


Best Regards,
Yan, Zi

next prev parent reply	other threads:[~2025-07-01 18:49 UTC|newest]

Thread overview: 28+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-06-27 10:39 siddhartha
2025-06-27 10:45 ` siddhartha
2025-06-27 15:30 ` Lorenzo Stoakes
2025-06-28  3:49   ` Dev Jain
2025-06-30  0:43     ` siddhartha
2025-06-30  5:25       ` Dev Jain
2025-06-30  5:28         ` Dev Jain
2025-06-30 10:54         ` Lorenzo Stoakes
2025-06-30 11:48           ` siddhartha
2025-07-01  5:23           ` Dev Jain
2025-07-01  5:28             ` Lorenzo Stoakes
2025-07-01  5:45               ` Dev Jain
2025-07-01  5:53                 ` Lorenzo Stoakes
2025-07-01  6:30                   ` Dev Jain
2025-07-01  6:50                     ` Lorenzo Stoakes
2025-07-01  6:58                       ` Dev Jain
2025-07-01 12:15                         ` siddhartha
2025-07-01 12:39                           ` Lorenzo Stoakes
2025-07-01 13:23                             ` siddhartha
2025-07-01 13:28                               ` Lorenzo Stoakes
2025-07-01 14:20                                 ` siddhartha
2025-07-01 16:20                             ` Dev Jain
2025-07-01 18:49                               ` Zi Yan [this message]
2025-07-07  8:56                                 ` Vlastimil Babka
2025-07-28  5:41                                   ` siddhartha
2025-07-28 11:00                                     ` Vlastimil Babka
2025-07-01 15:40                           ` Yang Shi
2025-08-11 22:14 siddhartha

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=5D015E99-474A-4D98-8C43-488A46BEB2F5@nvidia.com \
    --to=ziy@nvidia.com \
    --cc=dev.jain@arm.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=lorenzo.stoakes@oracle.com \
    --cc=mgorman@suse.de \
    --cc=siddhartha@kenip.in \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox