From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id C0670C3ABDD for ; Tue, 20 May 2025 11:59:58 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 2D0516B0082; Tue, 20 May 2025 07:59:58 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 27FC16B0083; Tue, 20 May 2025 07:59:58 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 1963B6B0085; Tue, 20 May 2025 07:59:58 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0013.hostedemail.com [216.40.44.13]) by kanga.kvack.org (Postfix) with ESMTP id EF87E6B0082 for ; Tue, 20 May 2025 07:59:57 -0400 (EDT) Received: from smtpin01.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay09.hostedemail.com (Postfix) with ESMTP id 62B6180F1A for ; Tue, 20 May 2025 11:59:57 +0000 (UTC) X-FDA: 83463142434.01.B99DF02 Received: from mail-qt1-f180.google.com (mail-qt1-f180.google.com [209.85.160.180]) by imf20.hostedemail.com (Postfix) with ESMTP id 806851C0008 for ; Tue, 20 May 2025 11:59:55 +0000 (UTC) Authentication-Results: imf20.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b="Z6+XEH/G"; dmarc=pass (policy=none) header.from=gmail.com; spf=pass (imf20.hostedemail.com: domain of laoar.shao@gmail.com designates 209.85.160.180 as permitted sender) smtp.mailfrom=laoar.shao@gmail.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1747742395; a=rsa-sha256; cv=none; b=XKbSvLMHXOkZFGA8k+M8MhJIQIWdbPzK+0ghhkZUWEWSyflHdES1Iffk/F15QTxc/Kj4gM V4kxLW0FN0pijFBB5n4tA0clp79XVA8Rb1elsEStmSIoIYO3nW7r/V07gQmVZSsrWboj2g nhjr1YA2AVJ6SEZBpNeOKLcKZ0lLYG4= ARC-Authentication-Results: i=1; imf20.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b="Z6+XEH/G"; dmarc=pass (policy=none) header.from=gmail.com; spf=pass (imf20.hostedemail.com: domain of laoar.shao@gmail.com designates 209.85.160.180 as permitted sender) smtp.mailfrom=laoar.shao@gmail.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1747742395; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=TmzmXWOCOLKudRaM4GnsN1lpbqwMoGYTGHPcZHsICw0=; b=uXQfEiOsDkP56hj3Ak0DcaWfFDD9/WiMhJyZChZhFuSxyhLiImYz5atI+pADZEt+z909qo Id6773DumtjNEUQcrtm4TRsmXo1tsrSEnvDctuijxuF2tKV36mzv5BeAtkwPALsPBl85yh muzlkzOg5Jjb57W8ySzmQmtwNmn2V78= Received: by mail-qt1-f180.google.com with SMTP id d75a77b69052e-4775ccf3e56so76615471cf.0 for ; Tue, 20 May 2025 04:59:55 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1747742394; x=1748347194; darn=kvack.org; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=TmzmXWOCOLKudRaM4GnsN1lpbqwMoGYTGHPcZHsICw0=; b=Z6+XEH/GG0OUGUhpwQ9U7LInVrCLZG5TmsYnXBDQrNedKetMibGoTU6a9XvAyMPRmT htA5pGe1HXd1U9YOqSAC1SslmaBnsPNquGKsYUYIGlzTQoWqrmkvsE6NuF/k1lVACAcX pGeSaAEz3Ix7wzAGNMNgfCR1+6CRSJDgG7OcBw58PHHGZLY7iaJrhdIQURHtjMKUppYq 4g9y55W7a+pMNZLIizPSKjmKtR0eI+4IEZH/8SoI3WtpqYCU7gYfnLhnC92V/yLBVFs6 7SEyCD83Bp3De/f4OlmiPyl23alnKoaIAqEijD26AMkvNt1wGsndY6D3VKBL4uBqoQPu RQWA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1747742394; x=1748347194; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=TmzmXWOCOLKudRaM4GnsN1lpbqwMoGYTGHPcZHsICw0=; b=GWQYY5Fr7Nz74OKG8Xrq8hQSKwM9QNfhqreRN4G5Q07pwLvjt4SwmpmG7x4xgmgz4R Ebuaj8NgGUKIQkQaozR2cGYtzSKVilC+52VxllGCxY80QMzMTzf50uDV13OcUMknkFvg W8O0D5NWaNuIjRacNqtlotXvAsnXqrju1iYsigprGs/j6gfBZls1KF2eOzML6+22brpg PW/kGmCLjPuDB8mB13n9iXbD8BeY7rFTXH1w/aIC+B4MuG7Je/wDVXcZRKp2l4d+JyMv 6ffOe36hyCFlKs2jJ6uiZtI7pq+rMh6ZwN3gfKUGx+iVioFbJqX7t8wQ7CyfC1DhzrlL wpyw== X-Forwarded-Encrypted: i=1; AJvYcCXlkeQ8f2TTO1SO7o4zvoSM+RwhBPV/ekU3mum8jDYdN/hC0pPwQoChWIbxqsq59WGZqdq33Gdwhg==@kvack.org X-Gm-Message-State: AOJu0YyL5BLMToEPtKZvvwC3zQIL5MpTqmfJLpQ52pyphv52lp+DvbXV fnDHJOhk40o+wbEc1Ki5qanZPR3w8x0+jqT0KfV+T2QHFkxEfrvGAPWfAkGQF+v9+RdAJz7aNHq dCYtAagJx5lbkSBbjq832S6h9i5o05sQ= X-Gm-Gg: ASbGncu0tSwI0zBkpCi2Sx1D9ZiMPR3eprtyATIRtplhYK1UvjiY4dcNRYzMhloPTLY ZiQEOc0Kj5eYTUohZuC6kHIm3w9kzeJB3W1WQ4j2HDZqlrdYbIu9U79vLCVSX/WetquHwqRIVqu aI+8LdNhRchIcDQAttt6OLW0/HTz+eKGigCQ== X-Google-Smtp-Source: AGHT+IFNPeUYtvfTVRQGODFgUWvwLuCkeD/5EPnj2TsmUa/Q272Nqt3jsCaUdF9FaATdXEwuheRojNCH1Vlm0XC/nuk= X-Received: by 2002:a05:6214:1c8e:b0:6f8:db05:98fa with SMTP id 6a1803df08f44-6f8db059b73mr91967966d6.7.1747742394411; Tue, 20 May 2025 04:59:54 -0700 (PDT) MIME-Version: 1.0 References: <20250520060504.20251-1-laoar.shao@gmail.com> <746e8123-2332-41c8-851b-787cb8c144a1@redhat.com> In-Reply-To: <746e8123-2332-41c8-851b-787cb8c144a1@redhat.com> From: Yafang Shao Date: Tue, 20 May 2025 19:59:17 +0800 X-Gm-Features: AX0GCFsl0S56xt1j_JmLA_4B2jAubwAHuSPAcX88fYOPoMl7YlTchSbp0Hgjhg0 Message-ID: Subject: Re: [RFC PATCH v2 0/5] mm, bpf: BPF based THP adjustment To: David Hildenbrand Cc: akpm@linux-foundation.org, ziy@nvidia.com, baolin.wang@linux.alibaba.com, lorenzo.stoakes@oracle.com, Liam.Howlett@oracle.com, npache@redhat.com, ryan.roberts@arm.com, dev.jain@arm.com, hannes@cmpxchg.org, usamaarif642@gmail.com, gutierrez.asier@huawei-partners.com, willy@infradead.org, ast@kernel.org, daniel@iogearbox.net, andrii@kernel.org, bpf@vger.kernel.org, linux-mm@kvack.org Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Rspamd-Server: rspam12 X-Rspamd-Queue-Id: 806851C0008 X-Rspam-User: X-Stat-Signature: t4ccqsi6n6d4y7ky3tu4qsfos6dz611x X-HE-Tag: 1747742395-712180 X-HE-Meta: U2FsdGVkX1/pP3J7XyICt0gdz136Ret+tp3VAEAnN6eAtK09PoMix8cbNjeoZkUoQsdCHsxx+g+AohazP/IAN1Fd2DKtErXx2QtKIcxr3PAildVUuN92+bwz7SCQ4vnZ9i9STx3auWJ8Uw5yuQ6NerMC2s0fdnpokHUZYp27awCLvkeU8jgZQAam3J8eYBtW4aBnWRJBau3n0cY75EKrm+HvgBvcqMlSL7Ay1Ca4An8+y9BOgYqZ0CO21OZgVyk/NL00vYcJHHjzvXTc5m9VNWiSiZT5oIBI7tFQ2EgR+KrJmv4XTy17lLFn5BHSFSv9HQMcGjPfodQOfvNkyQS0URbKrB+Ekp+xv6In7Em6enKQy6fU4AgEHwGzZcIFfN0bVdi6cyZ49YRFnnHGsUwqIIdGGIYFfbAxZGbUkBHr4lIZyi7Bxdwq2ikaH5Fuy7+V3DZNUXC4ESgBPusBWY96kN/1sjsYP7GK8T0CYdy9TzV4kfmLAgTMGRCZuWN4a/5drtQpyUymgvwHNSIf1uZSswHww8wDfgeHSpRwN/hWHB05dqQjkPrC7wUemQcqP33qFkUWrCmK6B/UmC3UBjX1/KYqs0PWDLxyAvUMz9Uhnqdnx2U/4jJuY3wAo98HhAJj/rqSUJwULspo3W/P/itSEUe9N8zInvMeuZqmv384PN2BFmBMq9txXKgK5rtXFvBXlGdABYxDuOkE4kZ1mRvkGpnaSGgTzE0LdM3UdK83qwRGbDMzJXIYKQgLkdEUgxkvosJvRWJB00bp9p9eXCEIuk9upWVlWGYEWfKsqfGpo+l9JU0wNy2wYKOSau1dIICWF9Tfy8AG/Eb0sZWG1X+0HRJQset7aqN7+nKh82mMJqmY5KB0h5SuJNKK9OW9GgRmJkHj8QNt9RCBuCd5VFAiWTNpKpZszk1Ej3F2iQCqLDmCTKuD6K3KR3fUMAbHUDUq+z2LyxPRudoMfagOOqp tNtBtqMh d+wnsAWAWSj8fYX743rPYIcmsE7X7TYCP24K2Pl0iw2P3a/Wbt5tlfOtXg6Ao9DIfELBzAzRJUdw6f1IhyCVYEF0iSh0B8ToDn12yx2P8UleAXCO3AxzuxPZnLuXmD+a77T8rqmn5xsibJS8/FdlXjCbwh3ZS7JLmfXu8KVeC1ysFykXOmrVXCk4gCDM+6UuQJ4dyRkai8nK8IypgZ4Yt02KrviuoijrmwzWWxi1ZLargTuz8OXn5oZPjfh1gPZ8KtVb6xcDYoMzo2WkJTkY4Sy1GAw7r0raNNuggN/hueMVrIbxgSH0FQkI6m4S9FgY6kGTQnkDhlBLvmEVSE8wyJ1DjRleq/4s0+g5ddUGzXud9O3/WvX+XQnqJbg== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On Tue, May 20, 2025 at 5:44=E2=80=AFPM David Hildenbrand wrote: > > > Conclusion > > ---------- > > > > Introducing a new "bpf" mode for BPF-based per-task THP adjustments is = the > > most effective solution for our requirements. This approach represents = a > > small but meaningful step toward making THP truly usable=E2=80=94and ma= nageable=E2=80=94in > > production environments. > A new "bpf" mode sounds way too special. Alternatively, we could simply hook 'madvise' to define a BPF-based policy. > > We currently have: > > never -> never > madvise -> MADV_HUGEPAGE, except PR_SET_THP_DISABLE > always -> always, except PR_SET_THP_DISABLE and MADV_NOHUGEPAGE If BPF had been invented before THP, we likely would have only three modes=E2=80=94without PR_SET_THP_DISABLE, MADV_NOHUGEPAGE, or MADV_HUGEPAGE= ;-) never -> never user -> user defined per task or per vma THP mode selector, based on BPF We can select "never" or "always" for a specific task or vma The API is as follows, bpf->per_task_mode_selector(task); bpf->per_vma_mode_selecor(vma); always -> always However, it=E2=80=99s not too late to introduce a new BPF-based mode for TH= P, especially since future adjustments to THP policies are still expected. Regardless of the specific policy, two fundamental principles apply: 1. Selective Benefit: Some tasks benefit from THP, while others do not. 2. Conditional Safety: THP allocation is safe under certain conditions but not others. Given these constraints, we could abstract stable APIs that allow users to define custom THP policies tailored to their needs. > > Whatever new mode we add, it should honor PR_SET_THP_DISABLE + > MADV_NOHUGEPAGE. Yes, the BPF only selects different THP modes for different tasks, nothing else won't be changed. > > So, if we want another way to enable things, it would live between > "never" and "madvise". Yes, BPF only selects the appropriate THP mode for each task=E2=80=94nothin= g else is modified. > > I'm wondering how we could make that generic: likely we want this new > mechanism to *not* be triggerable by the process itself (madvise). > > I am not convinced bpf is the answer here ... I believe the key insight is that we should define a generic, stable API for BPF-based THP mode selection. -- Regards Yafang