Re: [PATCH v12 mm-new 06/10] mm: bpf-thp: add support for global mode

linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed

From: Yafang Shao <laoar.shao@gmail.com>
To: "David Hildenbrand (Red Hat)" <david@kernel.org>
Cc: Alexei Starovoitov <alexei.starovoitov@gmail.com>,
	Andrew Morton <akpm@linux-foundation.org>,
	 Alexei Starovoitov <ast@kernel.org>,
	Daniel Borkmann <daniel@iogearbox.net>,
	 Andrii Nakryiko <andrii@kernel.org>,
	Lorenzo Stoakes <lorenzo.stoakes@oracle.com>,
	 Martin KaFai Lau <martin.lau@linux.dev>,
	Eduard <eddyz87@gmail.com>, Song Liu <song@kernel.org>,
	 Yonghong Song <yonghong.song@linux.dev>,
	John Fastabend <john.fastabend@gmail.com>,
	 KP Singh <kpsingh@kernel.org>,
	Stanislav Fomichev <sdf@fomichev.me>, Hao Luo <haoluo@google.com>,
	 Jiri Olsa <jolsa@kernel.org>, Zi Yan <ziy@nvidia.com>,
	 Liam Howlett <Liam.Howlett@oracle.com>,
	npache@redhat.com, ryan.roberts@arm.com,  dev.jain@arm.com,
	Johannes Weiner <hannes@cmpxchg.org>,
	usamaarif642@gmail.com,  gutierrez.asier@huawei-partners.com,
	Matthew Wilcox <willy@infradead.org>,
	 Amery Hung <ameryhung@gmail.com>,
	David Rientjes <rientjes@google.com>,
	 Jonathan Corbet <corbet@lwn.net>, Barry Song <21cnbao@gmail.com>,
	 Shakeel Butt <shakeel.butt@linux.dev>, Tejun Heo <tj@kernel.org>,
	lance.yang@linux.dev,  Randy Dunlap <rdunlap@infradead.org>,
	Chris Mason <clm@meta.com>, bpf <bpf@vger.kernel.org>,
	 linux-mm <linux-mm@kvack.org>
Subject: Re: [PATCH v12 mm-new 06/10] mm: bpf-thp: add support for global mode
Date: Fri, 28 Nov 2025 10:53:53 +0800	[thread overview]
Message-ID: <CALOAHbCR3Y=GCpX8S9CctONO=Emh4RvYAibHU=ZQyLP1s0MOVQ@mail.gmail.com> (raw)
In-Reply-To: <9f73a5bd-32a0-4d5f-8a3f-7bff8232e408@kernel.org>

On Thu, Nov 27, 2025 at 7:48 PM David Hildenbrand (Red Hat)
<david@kernel.org> wrote:
>
> >> To move forward, I'm happy to set the global mode aside for now and
> >> potentially drop it in the next version. I'd really like to hear your
> >> perspective on the per-process mode. Does this implementation meet
> >> your needs?
>
> I haven't had the capacity to follow the evolution of this patch set
> unfortunately, just to comment on some points from my perspective.
>
> First, I agree that the global mode is not what we want, not even as a
> fallback.
>
> >
> > Attaching st_ops to task_struct or to mm_struct is a can of worms.
> > With cgroup-bpf we went through painful bugs with lifetime
> > of cgroup vs bpf, dying cgroups, wq deadlock, etc. All these
> > problems are behind us. With st_ops in mm_struct it will be more
> > painful. I'd rather not go that route.
>
> That's valuable information, thanks. I would have hoped that per-MM
> policies would be easier.

The per-MM approach has a performance advantage over per-MEMCG
policies. This is because it accesses the policy hook directly via

  vma->vm_mm->bpf_mm->policy_hook()

whereas the per-MEMCG method requires a more expensive lookup:

  memcg = get_mem_cgroup_from_mm(vma->vm_mm);
  memcg->bpf_memcg->policy_hook();

This lookup could be a concern in a critical path. However, this
performance issue in the per-MEMCG mode can be mitigated. For
instance, when a task is added to a new memcg, we can cache the hook
pointer:

  task->mm->bpf_mm->policy_hook = memcg->bpf_memcg->policy_hook

Ultimately, we might still introduce a mm_struct:bpf_mm field to
provide an efficient interface.

>
> Are there some pointers to explore regarding the "can of worms" you
> mention when it comes to per-MM policies?
>
> >
> > And revist cgroup instead, since you were way too quick
> > to accept the pushback because all you wanted is global mode.
> >
> > The main reason for pushback was:
> > "
> > Cgroup was designed for resource management not for grouping processes and
> > tune those processes
> > "
> >
> > which was true when cgroup-v2 was designed, but that ship sailed
> > years ago when we introduced cgroup-bpf.
>
> Also valuable information.
>
> Personally I don't have a preference regarding per-mm or per-cgroup.
> Whatever we can get working reliably.

I am open to either approach, as long as it's acceptable to the maintainers.

> Sounds like cgroup-bpf has sorted
> out most of the mess.

No, the attach-based cgroup-bpf has proven to be ... a "can of worms"
in practice ...
 (I welcome corrections from the BPF maintainers if my assessment is
inaccurate.)

While the struct-ops-based cgroup-bpf is still under discussion.

>
> memcg/cgroup maintainers might disagree, but it's probably worth having
> that discussion once again.
>
> > None of the progs are doing resource management and lots of infrastructure,
> > container management, and open source projects use cgroup-bpf
> > as a grouping of processes. bpf progs attached to cgroup/hook tuple
> > only care about processes within that cgroup. No resource management.
> > See __cgroup_bpf_check_dev_permission or __cgroup_bpf_run_filter_sysctl
> > and others.
> > The path is current->cgroup->bpf_progs and progs do exactly
> > what cgroup wasn't designed to do. They tune a set of processes.
> >
> > You should do the same.
> >
> > Also I really don't see a compelling use case for bpf in THP.
>
> There is a lot more potential there to write fine-tuned policies that
> thack VMA information into account.
>
> The tests likely reflect what Yafang seems to focus on: IIUC primarily
> enabling+disabling traditional THPs (e.g., 2M) on a per-process basis.

Right.

>
> Some of what Yafang might want to achieve could maybe at this point be
> maybe achieved through the prctl(PR_SET_THP_DISABLE) support, including
> extensions we recently added [1].
>
> Systemd support still seems to be in the works [2] for some of that.
>
>
> [1] https://lwn.net/Articles/1032014/
> [2] https://github.com/systemd/systemd/pull/39085

Thank you for sharing this.
However, BPF-THP is already deployed across our server fleet and both
our users and my boss are satisfied with it. As such, we are not
considering a switch. The current solution also offers us a valuable
opportunity to experiment with additional policies in production.

In summary, I am fine with either the per-MM or per-MEMCG method.
Furthermore, I don't believe this is an either-or decision; both can
be implemented to work together.


--
Regards
Yafang

next prev parent reply	other threads:[~2025-11-28  2:54 UTC|newest]

Thread overview: 29+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-10-26 10:01 [PATCH v12 mm-new 00/10] mm, bpf: BPF-MM, BPF-THP Yafang Shao
2025-10-26 10:01 ` [PATCH v12 mm-new 01/10] mm: thp: remove vm_flags parameter from khugepaged_enter_vma() Yafang Shao
2025-10-26 10:01 ` [PATCH v12 mm-new 02/10] mm: thp: remove vm_flags parameter from thp_vma_allowable_order() Yafang Shao
2025-10-26 10:01 ` [PATCH v12 mm-new 03/10] mm: thp: add support for BPF based THP order selection Yafang Shao
2025-10-26 10:01 ` [PATCH v12 mm-new 04/10] mm: thp: decouple THP allocation between swap and page fault paths Yafang Shao
2025-10-27  4:07   ` Barry Song
2025-10-26 10:01 ` [PATCH v12 mm-new 05/10] mm: thp: enable THP allocation exclusively through khugepaged Yafang Shao
2025-10-26 10:01 ` [PATCH v12 mm-new 06/10] mm: bpf-thp: add support for global mode Yafang Shao
2025-10-29  1:32   ` Alexei Starovoitov
2025-10-29  2:13     ` Yafang Shao
2025-10-30  0:57       ` Alexei Starovoitov
2025-10-30  2:40         ` Yafang Shao
2025-11-27 11:48         ` David Hildenbrand (Red Hat)
2025-11-28  2:53           ` Yafang Shao [this message]
2025-11-28  7:57             ` Lorenzo Stoakes
2025-11-28  8:18               ` Yafang Shao
2025-11-28  8:31                 ` Lorenzo Stoakes
2025-11-28 11:56                   ` Yafang Shao
2025-11-28 12:18                     ` Lorenzo Stoakes
2025-11-28 12:51                       ` Yafang Shao
2025-11-28  8:39             ` David Hildenbrand (Red Hat)
2025-11-28  8:55               ` Lorenzo Stoakes
2025-11-30 13:06               ` Yafang Shao
2025-11-26 15:13     ` Rik van Riel
2025-11-27  2:35       ` Yafang Shao
2025-10-26 10:01 ` [PATCH v12 mm-new 07/10] Documentation: add BPF THP Yafang Shao
2025-10-26 10:01 ` [PATCH v12 mm-new 08/10] selftests/bpf: add a simple BPF based THP policy Yafang Shao
2025-10-26 10:01 ` [PATCH v12 mm-new 09/10] selftests/bpf: add test case to update " Yafang Shao
2025-10-26 10:01 ` [PATCH v12 mm-new 10/10] selftests/bpf: add test case for BPF-THP inheritance across fork Yafang Shao

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='CALOAHbCR3Y=GCpX8S9CctONO=Emh4RvYAibHU=ZQyLP1s0MOVQ@mail.gmail.com' \
    --to=laoar.shao@gmail.com \
    --cc=21cnbao@gmail.com \
    --cc=Liam.Howlett@oracle.com \
    --cc=akpm@linux-foundation.org \
    --cc=alexei.starovoitov@gmail.com \
    --cc=ameryhung@gmail.com \
    --cc=andrii@kernel.org \
    --cc=ast@kernel.org \
    --cc=bpf@vger.kernel.org \
    --cc=clm@meta.com \
    --cc=corbet@lwn.net \
    --cc=daniel@iogearbox.net \
    --cc=david@kernel.org \
    --cc=dev.jain@arm.com \
    --cc=eddyz87@gmail.com \
    --cc=gutierrez.asier@huawei-partners.com \
    --cc=hannes@cmpxchg.org \
    --cc=haoluo@google.com \
    --cc=john.fastabend@gmail.com \
    --cc=jolsa@kernel.org \
    --cc=kpsingh@kernel.org \
    --cc=lance.yang@linux.dev \
    --cc=linux-mm@kvack.org \
    --cc=lorenzo.stoakes@oracle.com \
    --cc=martin.lau@linux.dev \
    --cc=npache@redhat.com \
    --cc=rdunlap@infradead.org \
    --cc=rientjes@google.com \
    --cc=ryan.roberts@arm.com \
    --cc=sdf@fomichev.me \
    --cc=shakeel.butt@linux.dev \
    --cc=song@kernel.org \
    --cc=tj@kernel.org \
    --cc=usamaarif642@gmail.com \
    --cc=willy@infradead.org \
    --cc=yonghong.song@linux.dev \
    --cc=ziy@nvidia.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox