From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <owner-linux-mm@kvack.org>
X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on
	aws-us-west-2-korg-lkml-1.web.codeaurora.org
Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17])
	by smtp.lore.kernel.org (Postfix) with ESMTP id 7B8A1C54FB3
	for <linux-mm@archiver.kernel.org>; Mon, 26 May 2025 09:38:28 +0000 (UTC)
Received: by kanga.kvack.org (Postfix)
	id 6057D6B0092; Mon, 26 May 2025 05:38:27 -0400 (EDT)
Received: by kanga.kvack.org (Postfix, from userid 40)
	id 5DD176B0093; Mon, 26 May 2025 05:38:27 -0400 (EDT)
X-Delivered-To: int-list-linux-mm@kvack.org
Received: by kanga.kvack.org (Postfix, from userid 63042)
	id 4F2EC6B0095; Mon, 26 May 2025 05:38:27 -0400 (EDT)
X-Delivered-To: linux-mm@kvack.org
Received: from relay.hostedemail.com (smtprelay0011.hostedemail.com [216.40.44.11])
	by kanga.kvack.org (Postfix) with ESMTP id 1E4DB6B0092
	for <linux-mm@kvack.org>; Mon, 26 May 2025 05:38:27 -0400 (EDT)
Received: from smtpin10.hostedemail.com (a10.router.float.18 [10.200.18.1])
	by unirelay08.hostedemail.com (Postfix) with ESMTP id B8407140839
	for <linux-mm@kvack.org>; Mon, 26 May 2025 09:38:26 +0000 (UTC)
X-FDA: 83484558612.10.14E8E9F
Received: from mail-qt1-f181.google.com (mail-qt1-f181.google.com [209.85.160.181])
	by imf23.hostedemail.com (Postfix) with ESMTP id D662A14000B
	for <linux-mm@kvack.org>; Mon, 26 May 2025 09:38:24 +0000 (UTC)
Authentication-Results: imf23.hostedemail.com;
	dkim=pass header.d=gmail.com header.s=20230601 header.b=FjzK3ftn;
	dmarc=pass (policy=none) header.from=gmail.com;
	spf=pass (imf23.hostedemail.com: domain of laoar.shao@gmail.com designates 209.85.160.181 as permitted sender) smtp.mailfrom=laoar.shao@gmail.com
ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com;
	s=arc-20220608; t=1748252304;
	h=from:from:sender:reply-to:subject:subject:date:date:
	 message-id:message-id:to:to:cc:cc:mime-version:mime-version:
	 content-type:content-type:
	 content-transfer-encoding:content-transfer-encoding:
	 in-reply-to:in-reply-to:references:references:dkim-signature;
	bh=d5A59FT9CuvJsFQo35fW96IT4aEXb+iFDHAsNy6R8rw=;
	b=xSmMG67s//jXDxTo8lIY52pxZm56KpxdhkUD86Q/n+8U3nC8kbjiGkuOp5kIJMsF+wP7HC
	Mw7gjMILNwfEgm7GBxx8j/DAGSfCVCsiIjyzJvfIuMHGEKj9VVoXWTD9KcRGu6ZqDTzzeE
	Fbykle3O9tcGffW4lE1El7Pv8aaZHYU=
ARC-Authentication-Results: i=1;
	imf23.hostedemail.com;
	dkim=pass header.d=gmail.com header.s=20230601 header.b=FjzK3ftn;
	dmarc=pass (policy=none) header.from=gmail.com;
	spf=pass (imf23.hostedemail.com: domain of laoar.shao@gmail.com designates 209.85.160.181 as permitted sender) smtp.mailfrom=laoar.shao@gmail.com
ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1748252304; a=rsa-sha256;
	cv=none;
	b=ylqggVcoxeo2uYXl6zlDx6rH1CdRvNNgbUT+HTOaFTLS04Tp7+6AWFp/d5K1clfwOtV5Zg
	C0ak8MeA9D67q1dkiFjN9Fk+hpIs/p+z7+FM3RepoOu31x1tZpTZZeagadTnW9Dy4o9Wp2
	7mVUct9lrbDFRg+b8fjHFGQkGXrJJYk=
Received: by mail-qt1-f181.google.com with SMTP id d75a77b69052e-47692b9d059so27497551cf.3
        for <linux-mm@kvack.org>; Mon, 26 May 2025 02:38:24 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=gmail.com; s=20230601; t=1748252304; x=1748857104; darn=kvack.org;
        h=content-transfer-encoding:cc:to:subject:message-id:date:from
         :in-reply-to:references:mime-version:from:to:cc:subject:date
         :message-id:reply-to;
        bh=d5A59FT9CuvJsFQo35fW96IT4aEXb+iFDHAsNy6R8rw=;
        b=FjzK3ftnkzYo2GiuKnnYWK3o0YFNk+61QjNkX0oX+GEpP2nru9Wmtg5QQJbHzPHwg4
         zp5M4ro9am8MiHYfEGMINN2cH8jSVHGXEdjv4Y6FPBNLQbm4dYxbEG5BIW5ffELOk3uO
         PZv0B71WhD2/Njf3uGBudpbdkfZdMgJ9UC30x16HnFK69GTP0Mykes6TYo62EyxhCRY9
         hXdIBOL6zJZUmzvTA27yPQNo88tn2C5gkD+RnHCfrQqKAsqumNRAhOkCxwAdJC9UDtn3
         HdN2CK1+5bEnMdcv9otMBYr1FS7K+qhP4I9R8WNNB//3s/nDaL2sjjMtsq4COtIzLHfh
         oIKw==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=1e100.net; s=20230601; t=1748252304; x=1748857104;
        h=content-transfer-encoding:cc:to:subject:message-id:date:from
         :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc
         :subject:date:message-id:reply-to;
        bh=d5A59FT9CuvJsFQo35fW96IT4aEXb+iFDHAsNy6R8rw=;
        b=wyowV9vQ8tNYkSPfa51unQagFuh4d0dmTkDAbrVgfKdBq9BOOYfwMFsVvGqyTax9Me
         MTUaB9q5jz5WclN3C/Pyzalx5aphbf1K0FHvLNidn53Kj+1/xaKjh2QGOSpEbhOvMb0d
         4jGtgABzVfsHy4xZTSYPtVRIcReQcG91s0csepoLwJeoR+055E5oJWQRP1tYSPGv2/1E
         J6buAlmFSKNY2VWt1TbLsqv0KIPAECFjB4gMVhLigTR5cvypMgFQPzEkHgFK8RCOLqZm
         Yy1o7QqRfwEJeZHdthj6osBxcFNAVK6xNreIWH2CY6aUQXKGl/rRQ3cRX6OEuWOCppLq
         Fdfw==
X-Forwarded-Encrypted: i=1; AJvYcCWza+4QpQa+mQgFrcaDa3uOPfxE/bWduxyMHYp0QisF0pzcRBV8RY8wm6IWRr6qbEbEHU9ohnGNdA==@kvack.org
X-Gm-Message-State: AOJu0Yz4BW5wC7qnRAF7jggp+S7h455673AwC5ekSUcP3CR4KqBGqp+5
	YPBvO04FmZ2yPIovTiChGTOiYHdgIVjVVJz0G2R1OQc/SfErCKkwLWrZdd82doEIR5LTaEwRTSP
	Fi3/9fEIZkkadgkgy2gX5cMxwgsw6Y+4=
X-Gm-Gg: ASbGnctcUPKDcIfVE//uUkzsb5a/UD+ngXocIrb0D3+O9748RCTZbg2+ziwSsTYOwtL
	I3mpSdei1VBMxFIluKDKnymv/FMHFtMbZ2y7BhDhOr1HXkVlLvaONzuqO7+RQb14sXKC2nynXn9
	l7bqMucHm7RFLpvU43wAJz46VSYm5IkhvM7w==
X-Google-Smtp-Source: AGHT+IG9h1oa4yDTTsRr9qv2qF7VE34ip1MwtWbTEJCIyDkkvGY621U2jfjV/ws3UKHsOtZoB5coCMRvXhf564z6Hew=
X-Received: by 2002:a05:6214:5098:b0:6f8:d76c:5885 with SMTP id
 6a1803df08f44-6fa9d29f798mr155344516d6.37.1748252303888; Mon, 26 May 2025
 02:38:23 -0700 (PDT)
MIME-Version: 1.0
References: <20250520060504.20251-1-laoar.shao@gmail.com> <CALOAHbDPF+Mxqwh+5ScQFCyEdiz1ghNbgxJKAqmBRDeAZfe3sA@mail.gmail.com>
 <a03e4e99-bcbd-4279-acc1-34d665e7dcef@huawei-partners.com>
In-Reply-To: <a03e4e99-bcbd-4279-acc1-34d665e7dcef@huawei-partners.com>
From: Yafang Shao <laoar.shao@gmail.com>
Date: Mon, 26 May 2025 17:37:47 +0800
X-Gm-Features: AX0GCFtgqFzFDEYVvpGB9D9uX-pdDApKBy1rHjoi42WzlxzoCq7qQ353vDYJhqc
Message-ID: <CALOAHbDJPP499ZDitUYqThAJ_BmpeWN_NVR-wm=8XBe3X7Wxkw@mail.gmail.com>
Subject: Re: [RFC PATCH v2 0/5] mm, bpf: BPF based THP adjustment
To: Gutierrez Asier <gutierrez.asier@huawei-partners.com>
Cc: akpm@linux-foundation.org, david@redhat.com, ziy@nvidia.com, 
	baolin.wang@linux.alibaba.com, lorenzo.stoakes@oracle.com, 
	Liam.Howlett@oracle.com, npache@redhat.com, ryan.roberts@arm.com, 
	dev.jain@arm.com, hannes@cmpxchg.org, usamaarif642@gmail.com, 
	willy@infradead.org, ast@kernel.org, daniel@iogearbox.net, andrii@kernel.org, 
	bpf@vger.kernel.org, linux-mm@kvack.org
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable
X-Rspamd-Server: rspam05
X-Rspamd-Queue-Id: D662A14000B
X-Stat-Signature: up9gpbobaabspsmo6k6pi3p7k44jtkky
X-Rspam-User: 
X-HE-Tag: 1748252304-33475
X-HE-Meta: U2FsdGVkX19oh3nIsiPd366rapiJToMKZSEMRhgStIkiSZyHQVzhjTLM8ZVbDSANFQHayeWxV3Pu/QZ6w1kYEPwvyiIVdSxJsGU55rKfrBqu04y7SB3px3Db4wo0L7Y7S6/zmXdye7HQ3H9Z06umhGrn9e4FjIb2HV3+Rv+AyxXR3w474yFixuJdWDKAzaw5r05JjERA72QnX3/KPZ5ktgETdXulDTNRSoEBjzcYrJWvWWIwsddMOG+vy1uWPUdQooWD9RmKIApXcf/abBA6bJfFr6sfqEnetQCXmONGeRlARsRRHM2vP5oiipQm2XmZUNeo8KJE6zFYdJ0cW/PiGX+0IC80Nr0MmMEYNUgVSyC0Y3kEmeiVd0iILqzTh/XbJHuix9SKOhJt6uhgwjoktmBITcyOZQ7tS3amcOan/9c7KWf5FGUuvRrxUrqJFdNwpAgKR75ayByjSsFE40RusqpIVzCYOZICsqsxa7kr8JLCJPkBf5NRNjDZv+YwzXQzzO6Em8fdTRKJyoEbK3a1I9EeDz0ta89dSbrDmugPHNpZJRAyCLIhAyhOFRA5eNWav6g/4ZDm9SQL22R+kYzQ5OcgpK3ZR761gUgRxguPhC7WzngN5ZmVuVPJukGIOpMdIHp29kAQMCVeqimQAGcX5nHqPd4jC8i4/i75dA9gWYK0JVVTpfxPsN9BRLHwemuD4c7hjxcDPX1tn+gH0ZAuyDmwXqGrbhVJhn4yVpyyNddyxA+bSecsM6U6x9X9/1kU/a/K86S77fxQL47WsY17XeDVizDcRL7x6GQa4dAJp7PO30S37ZyP/c19wQ+33H2VNOsSBcZzChSp+dausdzI6ZQ2KMhWbldhOkZ/YRnGY2APlATe3pxVdCggSwOzvnU9WodapTtSCUYXaYtp7yXxGGVlaP7USJm2M7zt5TLrjcTRwzgMb+kFXV2BEBBc+CZrWNKH9raD7fRz8bnSW5w
 ZXVfHtXV
 xxzV1qvgkh2H47VUNsk/19Md7BsjKdpmB/gUmcxTai4mG3WmKdhRKgqv7WeFYh7gaDHj5hivue+bXmHRdZRGnLs8KisLglaPc8PkQaHqylVISbid5dF2TpFz9CG/CwWwbkLiU3MiiweficRGtz30ZRGsaR2NnFDuS+U3dvaLD7jmpwDfmDpBZk9j42gv4ijMd8qvRylSH7D2ZRc4J2QIcvHOX00aOavBcXVAXxZ15FkFH3jxHIhOzAbZ9cil/dKdFYusCUPz0iM6JCe2LEnfR22QzZjDoH6c+NPCc7r1CAXjL2mAEo3YWveAFvKpSuNnB6Uk08m6LaKSPGjpjjRa44Yxt5QV2C61Be/35gAkFjhyMqakjJRAQ48yf8Os1KyLy+talA7ozqS9/xYI0mC1NQZNDFOIXyi2l8gx+qckqOPOeoJYI+CQ7Is83eFoXE7P+iBP+m9b3bzYGJx1361Sxvmm8Cs3f7tbhs6Yq9jTZcxq5REr7qTH11UP8E0JRAoeUsThYFpSFZTowwwbbPyBH7DOOlfDvNbBwSLeELD4FAq2gtI0o+jTp7OV7pArLTHQ39yPViwf97L9vgNNvWjFY3c1I0A==
X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4
Sender: owner-linux-mm@kvack.org
Precedence: bulk
X-Loop: owner-majordomo@kvack.org
List-ID: <linux-mm.kvack.org>
List-Subscribe: <mailto:majordomo@kvack.org>
List-Unsubscribe: <mailto:majordomo@kvack.org>

On Mon, May 26, 2025 at 3:41=E2=80=AFPM Gutierrez Asier
<gutierrez.asier@huawei-partners.com> wrote:
>
>
>
> On 5/25/2025 6:01 AM, Yafang Shao wrote:
> > On Tue, May 20, 2025 at 2:05=E2=80=AFPM Yafang Shao <laoar.shao@gmail.c=
om> wrote:
> >>
> >> Background
> >> ----------
> >>
> >> At my current employer, PDD, we have consistently configured THP to "n=
ever"
> >> on our production servers due to past incidents caused by its behavior=
:
> >>
> >> - Increased memory consumption
> >>   THP significantly raises overall memory usage.
> >>
> >> - Latency spikes
> >>   Random latency spikes occur due to more frequent memory compaction
> >>   activity triggered by THP.
> >>
> >> These issues have made sysadmins hesitant to switch to "madvise" or
> >> "always" modes.
> >>
> >> New Motivation
> >> --------------
> >>
> >> We have now identified that certain AI workloads achieve substantial
> >> performance gains with THP enabled. However, we=E2=80=99ve also verifi=
ed that some
> >> workloads see little to no benefit=E2=80=94or are even negatively impa=
cted=E2=80=94by THP.
> >>
> >> In our Kubernetes environment, we deploy mixed workloads on a single s=
erver
> >> to maximize resource utilization. Our goal is to selectively enable TH=
P for
> >> services that benefit from it while keeping it disabled for others. Th=
is
> >> approach allows us to incrementally enable THP for additional services=
 and
> >> assess how to make it more viable in production.
> >>
> >> Proposed Solution
> >> -----------------
> >>
> >> For this use case, Johannes suggested introducing a dedicated mode [0]=
. In
> >> this new mode, we could implement BPF-based THP adjustment for fine-gr=
ained
> >> control over tasks or cgroups. If no BPF program is attached, THP rema=
ins
> >> in "never" mode. This solution elegantly meets our needs while avoidin=
g the
> >> complexity of managing BPF alongside other THP modes.
> >>
> >> A selftest example demonstrates how to enable THP for the current task
> >> while keeping it disabled for others.
> >>
> >> Alternative Proposals
> >> ---------------------
> >>
> >> - Gutierrez=E2=80=99s cgroup-based approach [1]
> >>   - Proposed adding a new cgroup file to control THP policy.
> >>   - However, as Johannes noted, cgroups are designed for hierarchical
> >>     resource allocation, not arbitrary policy settings [2].
> >>
> >> - Usama=E2=80=99s per-task THP proposal based on prctl() [3]:
> >>   - Enabling THP per task via prctl().
> >>   - As David pointed out, neither madvise() nor prctl() works in "neve=
r"
> >>     mode [4], making this solution insufficient for our needs.
> >>
> >> Conclusion
> >> ----------
> >>
> >> Introducing a new "bpf" mode for BPF-based per-task THP adjustments is=
 the
> >> most effective solution for our requirements. This approach represents=
 a
> >> small but meaningful step toward making THP truly usable=E2=80=94and m=
anageable=E2=80=94in
> >> production environments.
> >>
> >> This is currently a PoC implementation. Feedback of any kind is welcom=
e.
> >>
> >> Link: https://lore.kernel.org/linux-mm/20250509164654.GA608090@cmpxchg=
.org/ [0]
> >> Link: https://lore.kernel.org/linux-mm/20241030083311.965933-1-gutierr=
ez.asier@huawei-partners.com/ [1]
> >> Link: https://lore.kernel.org/linux-mm/20250430175954.GD2020@cmpxchg.o=
rg/ [2]
> >> Link: https://lore.kernel.org/linux-mm/20250519223307.3601786-1-usamaa=
rif642@gmail.com/ [3]
> >> Link: https://lore.kernel.org/linux-mm/41e60fa0-2943-4b3f-ba92-9f02838=
c881b@redhat.com/ [4]
> >>
> >> RFC v1->v2:
> >> The main changes are as follows,
> >> - Use struct_ops instead of fmod_ret (Alexei)
> >> - Introduce a new THP mode (Johannes)
> >> - Introduce new helpers for BPF hook (Zi)
> >> - Refine the commit log
> >>
> >> RFC v1: https://lwn.net/Articles/1019290/
> >>
> >> Yafang Shao (5):
> >>   mm: thp: Add a new mode "bpf"
> >>   mm: thp: Add hook for BPF based THP adjustment
> >>   mm: thp: add struct ops for BPF based THP adjustment
> >>   bpf: Add get_current_comm to bpf_base_func_proto
> >>   selftests/bpf: Add selftest for THP adjustment
> >>
> >>  include/linux/huge_mm.h                       |  15 +-
> >>  kernel/bpf/cgroup.c                           |   2 -
> >>  kernel/bpf/helpers.c                          |   2 +
> >>  mm/Makefile                                   |   3 +
> >>  mm/bpf_thp.c                                  | 120 ++++++++++++
> >>  mm/huge_memory.c                              |  65 ++++++-
> >>  mm/khugepaged.c                               |   3 +
> >>  tools/testing/selftests/bpf/config            |   1 +
> >>  .../selftests/bpf/prog_tests/thp_adjust.c     | 175 +++++++++++++++++=
+
> >>  .../selftests/bpf/progs/test_thp_adjust.c     |  39 ++++
> >>  10 files changed, 414 insertions(+), 11 deletions(-)
> >>  create mode 100644 mm/bpf_thp.c
> >>  create mode 100644 tools/testing/selftests/bpf/prog_tests/thp_adjust.=
c
> >>  create mode 100644 tools/testing/selftests/bpf/progs/test_thp_adjust.=
c
> >>
> >> --
> >> 2.43.5
> >>
> >
> > Hi all,
> >
> > Let=E2=80=99s summarize the current state of the discussion and identif=
y how
> > to move forward.
> >
> > - Global-Only Control is Not Viable
> > We all seem to agree that a global-only control for THP is unwise. In
> > practice, some workloads benefit from THP while others do not, so a
> > one-size-fits-all approach doesn=E2=80=99t work.
> >
> > - Should We Use "Always" or "Madvise"?
> > I suspect no one would choose 'always' in its current state. ;)
> > Both Lorenzo and David propose relying on the madvise mode. However,
> > since madvise is an unprivileged userspace mechanism, any user can
> > freely adjust their THP policy. This makes fine-grained control
> > impossible without breaking userspace compatibility=E2=80=94an undesira=
ble
> > tradeoff.
> > Given these limitations, the community should consider introducing a
> > new "admin" mode for privileged THP policy management.
> >
> > - Can the Kernel Automatically Manage THP Without User Input?
> > In practice, users define their own success metrics=E2=80=94such as lat=
ency
> > (RT), queries per second (QPS), or throughput=E2=80=94to evaluate a fea=
ture=E2=80=99s
> > usefulness. If a feature fails to improve these metrics, it provides
> > no practical value.
> > Currently, the kernel lacks visibility into user-defined metrics,
> > making fully automated optimization impossible (at least without user
> > input). More importantly, automatic management offers no benefit if it
> > doesn=E2=80=99t align with user needs.
>
> I don't think that using things like RPS or QPS is the right way.
> These metrics can be affected by many factors like network issues,
> garbage collectors in the user space (JVM, golang, etc.) and many other
> things out of our control. Even noisy neighbors can slow down a service.

This is an example of how to measure whether apps can benefit from a
new feature.
Please review the A/B test details here:
https://en.wikipedia.org/wiki/A/B_testing

--=20
Regards
Yafang