Re: [RFC PATCH v2 0/5] mm, bpf: BPF based THP adjustment

linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed

From: Yafang Shao <laoar.shao@gmail.com>
To: Nico Pache <npache@redhat.com>
Cc: akpm@linux-foundation.org, david@redhat.com, ziy@nvidia.com,
	 baolin.wang@linux.alibaba.com, lorenzo.stoakes@oracle.com,
	 Liam.Howlett@oracle.com, ryan.roberts@arm.com, dev.jain@arm.com,
	 hannes@cmpxchg.org, usamaarif642@gmail.com,
	 gutierrez.asier@huawei-partners.com, willy@infradead.org,
	ast@kernel.org,  daniel@iogearbox.net, andrii@kernel.org,
	bpf@vger.kernel.org,  linux-mm@kvack.org
Subject: Re: [RFC PATCH v2 0/5] mm, bpf: BPF based THP adjustment
Date: Tue, 20 May 2025 15:25:07 +0800	[thread overview]
Message-ID: <CALOAHbDbcdBZb_4mCpr4S81t8EBtDeSQ2OVSOH6qLNC-iYMa4A@mail.gmail.com> (raw)
In-Reply-To: <CAA1CXcD=P8tBASK1X=+2=+_RANi062X8QMsi632MjPh=dkuD9Q@mail.gmail.com>

On Tue, May 20, 2025 at 2:52 PM Nico Pache <npache@redhat.com> wrote:
>
> On Tue, May 20, 2025 at 12:06 AM Yafang Shao <laoar.shao@gmail.com> wrote:
> >
> > Background
> > ----------
> >
> > At my current employer, PDD, we have consistently configured THP to "never"
> > on our production servers due to past incidents caused by its behavior:
> >
> > - Increased memory consumption
> >   THP significantly raises overall memory usage.
> >
> > - Latency spikes
> >   Random latency spikes occur due to more frequent memory compaction
> >   activity triggered by THP.
> >
> > These issues have made sysadmins hesitant to switch to "madvise" or
> > "always" modes.
> >
> > New Motivation
> > --------------
> >
> > We have now identified that certain AI workloads achieve substantial
> > performance gains with THP enabled. However, we’ve also verified that some
> > workloads see little to no benefit—or are even negatively impacted—by THP.
> >
> > In our Kubernetes environment, we deploy mixed workloads on a single server
> > to maximize resource utilization. Our goal is to selectively enable THP for
> > services that benefit from it while keeping it disabled for others. This
> > approach allows us to incrementally enable THP for additional services and
> > assess how to make it more viable in production.
> >
> > Proposed Solution
> > -----------------
> >
> > For this use case, Johannes suggested introducing a dedicated mode [0]. In
> > this new mode, we could implement BPF-based THP adjustment for fine-grained
> > control over tasks or cgroups. If no BPF program is attached, THP remains
> > in "never" mode. This solution elegantly meets our needs while avoiding the
> > complexity of managing BPF alongside other THP modes.
> >
> > A selftest example demonstrates how to enable THP for the current task
> > while keeping it disabled for others.
> >
> > Alternative Proposals
> > ---------------------
> >
> > - Gutierrez’s cgroup-based approach [1]
> >   - Proposed adding a new cgroup file to control THP policy.
> >   - However, as Johannes noted, cgroups are designed for hierarchical
> >     resource allocation, not arbitrary policy settings [2].
> >
> > - Usama’s per-task THP proposal based on prctl() [3]:
> >   - Enabling THP per task via prctl().
> >   - As David pointed out, neither madvise() nor prctl() works in "never"
> >     mode [4], making this solution insufficient for our needs.
> Hi Yafang Shao,
>
> I believe you would have to invert your logic and disable the
> processes you dont want using THPs, and have THP="madvise"|"always". I
> have yet to look over Usama's solution in detail but I believe this is
> possible based on his cover letter.
>
> I also have an alternative solution proposed here!
> https://lore.kernel.org/lkml/20250515033857.132535-1-npache@redhat.com/
>
> It's different in the sense it doesn't give you granular control per
> process, cgroup, or BPF programmability, but it "may" suit your needs
> by taming the THP waste and removing the latency spikes of PF time THP
> compactions/allocations.

Thank you for developing this feature. I'll review it carefully.

The challenge we face is that our system administration team doesn't
permit enabling THP globally in production by setting it to "madvise"
or "always". As a result, we can only experiment with your feature on
our test servers at this stage.

Therefore, our immediate priority isn't THP optimization, but rather
finding a way to safely enable THP in production first. The kernel
team needs a solution that addresses this fundamental deployment
hurdle before we can consider performance improvements.

-- 
Regards
Yafang

next prev parent reply	other threads:[~2025-05-20  7:25 UTC|newest]

Thread overview: 52+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-05-20  6:04 Yafang Shao
2025-05-20  6:04 ` [RFC PATCH v2 1/5] mm: thp: Add a new mode "bpf" Yafang Shao
2025-05-20  6:05 ` [RFC PATCH v2 2/5] mm: thp: Add hook for BPF based THP adjustment Yafang Shao
2025-05-20  6:05 ` [RFC PATCH v2 3/5] mm: thp: add struct ops " Yafang Shao
2025-05-20  6:05 ` [RFC PATCH v2 4/5] bpf: Add get_current_comm to bpf_base_func_proto Yafang Shao
2025-05-20 23:32   ` Andrii Nakryiko
2025-05-20  6:05 ` [RFC PATCH v2 5/5] selftests/bpf: Add selftest for THP adjustment Yafang Shao
2025-05-20  6:52 ` [RFC PATCH v2 0/5] mm, bpf: BPF based " Nico Pache
2025-05-20  7:25   ` Yafang Shao [this message]
2025-05-20 13:10     ` Matthew Wilcox
2025-05-20 14:08       ` Yafang Shao
2025-05-20 14:22         ` Lorenzo Stoakes
2025-05-20 14:32           ` Usama Arif
2025-05-20 14:35             ` Lorenzo Stoakes
2025-05-20 14:42               ` Matthew Wilcox
2025-05-20 14:56                 ` David Hildenbrand
2025-05-21  4:28                 ` Yafang Shao
2025-05-20 14:46               ` Usama Arif
2025-05-20 15:00             ` David Hildenbrand
2025-05-20  9:43 ` David Hildenbrand
2025-05-20  9:49   ` Lorenzo Stoakes
2025-05-20 12:06     ` Yafang Shao
2025-05-20 13:45       ` Lorenzo Stoakes
2025-05-20 15:54         ` David Hildenbrand
2025-05-21  4:02           ` Yafang Shao
2025-05-21  3:52         ` Yafang Shao
2025-05-20 11:59   ` Yafang Shao
2025-05-25  3:01 ` Yafang Shao
2025-05-26  7:41   ` Gutierrez Asier
2025-05-26  9:37     ` Yafang Shao
2025-05-26  8:14   ` David Hildenbrand
2025-05-26  9:37     ` Yafang Shao
2025-05-26 10:49       ` David Hildenbrand
2025-05-26 14:53         ` Liam R. Howlett
2025-05-26 15:54           ` Liam R. Howlett
2025-05-26 16:51             ` David Hildenbrand
2025-05-26 17:07               ` Liam R. Howlett
2025-05-26 17:12                 ` David Hildenbrand
2025-05-26 20:30               ` Gutierrez Asier
2025-05-26 20:37                 ` David Hildenbrand
2025-05-27  5:46         ` Yafang Shao
2025-05-27  7:57           ` David Hildenbrand
2025-05-27  8:13             ` Yafang Shao
2025-05-27  8:30               ` David Hildenbrand
2025-05-27  8:40                 ` Yafang Shao
2025-05-27  9:27                   ` David Hildenbrand
2025-05-27  9:43                     ` Yafang Shao
2025-05-27 12:19                       ` David Hildenbrand
2025-05-28  2:04                         ` Yafang Shao
2025-05-28 20:32                           ` David Hildenbrand
2025-05-26 14:32   ` Zi Yan
2025-05-27  5:53     ` Yafang Shao

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=CALOAHbDbcdBZb_4mCpr4S81t8EBtDeSQ2OVSOH6qLNC-iYMa4A@mail.gmail.com \
    --to=laoar.shao@gmail.com \
    --cc=Liam.Howlett@oracle.com \
    --cc=akpm@linux-foundation.org \
    --cc=andrii@kernel.org \
    --cc=ast@kernel.org \
    --cc=baolin.wang@linux.alibaba.com \
    --cc=bpf@vger.kernel.org \
    --cc=daniel@iogearbox.net \
    --cc=david@redhat.com \
    --cc=dev.jain@arm.com \
    --cc=gutierrez.asier@huawei-partners.com \
    --cc=hannes@cmpxchg.org \
    --cc=linux-mm@kvack.org \
    --cc=lorenzo.stoakes@oracle.com \
    --cc=npache@redhat.com \
    --cc=ryan.roberts@arm.com \
    --cc=usamaarif642@gmail.com \
    --cc=willy@infradead.org \
    --cc=ziy@nvidia.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox