From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 01792C2D0CD for ; Wed, 21 May 2025 03:53:31 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 40E8F6B0089; Tue, 20 May 2025 23:53:31 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 3E69A6B008A; Tue, 20 May 2025 23:53:31 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 323476B008C; Tue, 20 May 2025 23:53:31 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0015.hostedemail.com [216.40.44.15]) by kanga.kvack.org (Postfix) with ESMTP id 14F036B0089 for ; Tue, 20 May 2025 23:53:31 -0400 (EDT) Received: from smtpin10.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay07.hostedemail.com (Postfix) with ESMTP id A947816215E for ; Wed, 21 May 2025 03:53:30 +0000 (UTC) X-FDA: 83465545380.10.1259A14 Received: from mail-qv1-f41.google.com (mail-qv1-f41.google.com [209.85.219.41]) by imf11.hostedemail.com (Postfix) with ESMTP id D3BB54000B for ; Wed, 21 May 2025 03:53:28 +0000 (UTC) Authentication-Results: imf11.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b=M7B4GvFO; spf=pass (imf11.hostedemail.com: domain of laoar.shao@gmail.com designates 209.85.219.41 as permitted sender) smtp.mailfrom=laoar.shao@gmail.com; dmarc=pass (policy=none) header.from=gmail.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1747799608; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=uoDEMrNtNl3uiTS1SuiY9P3+mVnaJbzCqQ63YQEBwVM=; b=E8l/wAFf8o2aSjINzDNUKeIYvxT6OSd1dVL3BEsLzFihTEJR4DNkQ70MmBEAF7r70oa0TO ao6SW0Ec8anHAHsJHWVDIja8No8fbe5ItTxKB2Jb/ka6ZDpRufSqtGH4vSHkkZ669XYAk9 B1smljBITAH+AJ5bZ01v/MvJtNgiU+A= ARC-Authentication-Results: i=1; imf11.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b=M7B4GvFO; spf=pass (imf11.hostedemail.com: domain of laoar.shao@gmail.com designates 209.85.219.41 as permitted sender) smtp.mailfrom=laoar.shao@gmail.com; dmarc=pass (policy=none) header.from=gmail.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1747799608; a=rsa-sha256; cv=none; b=2qjw3KwjJi9BD4fvFDTa73elOW0sYy0ld2eCX/oTixmGUJ3thTwMlmyaRifMaBrHf3ACLt QYUOsopnZBelJeCn61cBUpsgANNkONKn/xmHLttdjqYYDURt32iS3Kd8Bhy6h8dykaUFfy 9ZrFsM+4UpgaTQ3ftGrwAkjoMU2MMds= Received: by mail-qv1-f41.google.com with SMTP id 6a1803df08f44-6f8cda1ca6dso22679496d6.3 for ; Tue, 20 May 2025 20:53:28 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1747799608; x=1748404408; darn=kvack.org; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=uoDEMrNtNl3uiTS1SuiY9P3+mVnaJbzCqQ63YQEBwVM=; b=M7B4GvFOkLRsA9V0F+o6iBE8QH4V2L2ZKj/MeB4cZMUESxy/NxNC6z2bnmwEQsoP5o iAyHiaGpxnncINgp4I3Bwb2g31iOOfiun81mu023zRw1El7Ud/PLDTJCiqJa9HoYC3Vq Zyi6Cn4yFxawagqoK10v2v7KXON2bsVm+L49peWEhsAuUt7Q9bEIfOTmkZCTqSqH/xJp X37NxlMElXVzu58kGSzZTVHENycVLamQWyJe4/GLosecIRBB/9Q1XdKT+cRetdI45SxS qlOpWtBmcz7TUNNG8YCvtWmFMuX58XF61p1EhxLzu1f0E54o/4blMizbk26k+91NfhVy JGuw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1747799608; x=1748404408; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=uoDEMrNtNl3uiTS1SuiY9P3+mVnaJbzCqQ63YQEBwVM=; b=gdWJE92TsdDEYyf5V7zwr6CGk4FAuJC0IiRZh0TStKY9SSl/W4MtBVlm8FMMrVoLun 6xjgY0Ja/sEiNo6wQt7ybunECuKubhq4mqctk3auFd9Ce1NGua4R8ED5LON8LxhkwRv+ h6g6Wtzt6ZCRpHaomPbxBUu82pCge+rVdygVL9FXdLFHv8wmgHm30Y/SdEmw7K6kc0Y0 YtB2ywZjVTMXDJMJ7NdmuzAZhlo9tW+gDFSeiiovXhusek9u1GB/mBa2hl1qlPq/TyDm L9QiGOJBPqwRd0tn3jYDz0zRuGt+EtmCNlR/nj9PDqq2uC3NYt5x4ftpGI7XNpA9jTZi XPJQ== X-Forwarded-Encrypted: i=1; AJvYcCVkXrY2ZL+xC7HGZ2ImxL5ieVUmXtZnGuT+mGrqJFh3Dxv+4NetLYZcQFJDkzuLnypPiCggEGOC0g==@kvack.org X-Gm-Message-State: AOJu0YwFU6iV+gBCKD07SBjsTfju9dHHtTRITw4MTS/Pk5CdTp3SUQP2 w4iRl2P3j8D5YCNqoKsbQuyrB3I78dn1VRby4vfBiKQjnVLx1nDRmdWoOu7Wek7ITHDO+m4usMs qrWLtGvGkNgSMEb4l8foNHOQJAQSW8rk= X-Gm-Gg: ASbGnctwzJ6QRqYumEfYxiXJL/BlwFMfb5KAKDRmU72jcpNrkqpBFX3/IJi/5RXvvFf v6/nN5XdSwtDp8f8WdZ995XEEvJAT8t914fM1CmlHcMMYP2dwWMfQOrQYm0Gc5N0w42f/kCtbH1 5+8+f6I9pxxcXYYRqquT7bWNy/5XSGtWhLBQ== X-Google-Smtp-Source: AGHT+IFcQlxq0eUE8N8L1qHNgjaGjhBVw5YFAczWuK8oG9DqiErGcQ/iSHjYgzACN2Xb04quJyUiQ7jCrQ/jgv7zepk= X-Received: by 2002:ad4:4ee4:0:b0:6f8:a825:adea with SMTP id 6a1803df08f44-6f8b088bd2fmr357041056d6.15.1747799607817; Tue, 20 May 2025 20:53:27 -0700 (PDT) MIME-Version: 1.0 References: <20250520060504.20251-1-laoar.shao@gmail.com> <746e8123-2332-41c8-851b-787cb8c144a1@redhat.com> <849decad-ab38-4a1a-8532-f518a108d8c6@lucifer.local> In-Reply-To: <849decad-ab38-4a1a-8532-f518a108d8c6@lucifer.local> From: Yafang Shao Date: Wed, 21 May 2025 11:52:51 +0800 X-Gm-Features: AX0GCFsQiqwqPYpFbmkGryZtx4RlrFVoaExvt1HQ3km2SfRmfgGlGLfR69VDGME Message-ID: Subject: Re: [RFC PATCH v2 0/5] mm, bpf: BPF based THP adjustment To: Lorenzo Stoakes Cc: David Hildenbrand , akpm@linux-foundation.org, ziy@nvidia.com, baolin.wang@linux.alibaba.com, Liam.Howlett@oracle.com, npache@redhat.com, ryan.roberts@arm.com, dev.jain@arm.com, hannes@cmpxchg.org, usamaarif642@gmail.com, gutierrez.asier@huawei-partners.com, willy@infradead.org, ast@kernel.org, daniel@iogearbox.net, andrii@kernel.org, bpf@vger.kernel.org, linux-mm@kvack.org Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Rspamd-Server: rspam10 X-Stat-Signature: 58b8rwifi3zi77h78d4tq6jreesbret3 X-Rspamd-Queue-Id: D3BB54000B X-Rspam-User: X-HE-Tag: 1747799608-1857 X-HE-Meta: U2FsdGVkX1/vbpydfYjBXL9QXMDd5sgZ4bMozqJt/taklf7h0h1br6vtCG2K8YRWmkKskGBEdMrbx61gzQ9FASxpsRdRaTBqHethDUWTQ9+z0ARwa3ZvNR2R8eLQYk0bcfppvcuT4ge5CUIZuZRePGsxuII916G9707evmRvK/eenwIMQ2YFh7Ob9vPvLscx/0fPOmf6Pf62A20hNMoOsrH2WjJ1O7cztNLlOMPESEgzDoCEXHGlVdE3eZLDd/L4ANBEKtrctd/nHsqFumWOQ6L2lDJnVXj0pSzXJBJKXjky7NfOoivZZBdwYnSvv5ZOBp/FYHmAGufLcAUxwN0/L3/0pHUEAC643bKmaLYBAT+eJoqgdwpu1AjjRJiJVRJgWw8OG/CDHTXsAuX+5VLXqBxV/i6ef6llcBLX+gcvHkUtYded0TCniT3/hANHDwTwBxfZZW1c+fNmglM/6EphzpQIxFzboppIm2juL5goDYk1NicsrvYPzEhyfmbuTO3grQ8m8QdBDt2wP3biCIjXjmDRJJA3kh0MkxLgPXPokxvnvVYusx1X4PSdWaMhWBHt8JBzjNQRG3i4ckBRTVAgnDrx8cr8l8OVOVtNGAjcbsvcyZjVGZ+4x73Ql3au9CaeTepTNt6JlLLDoXfPUZCxFno0p2WYzadTV3n2Kpe7O4XCKmPENS96ojoa9QsDgKCQk5P4GY8MbUvkuT4RkKOn0ePqOJDkxNjig7ZpES4/OLiRWhxJMI9Ztu/BX9bZAOw/7GB07MmAs3junL9b50OW7bNOpyWqUOrCSG332Aen8bYjzKkmJBEMJq1m/bUjD2ObwOIAbPIHuwJZU7QaDGnPk5lxETSdOzk6oo56todOGHqMu/9BLdvOINLeXDG1aBTVPuPT7GehE0WPxE+IGuhOesHeLmh/ymSuD5c1WAEg6S60zlLgYJlEnkwE/Fwi9C5UdSVpdS/H1Sl734snuqX mIDdU2u/ 1y5SIlVHs+sLgKKrCDCsv+BotyVY7xHJ5Sla4xWBONmCz9mCN5iGuVmX92ItjzHs/WUhjFdSIkodFSqOVQDezMEBg6BmBTqFSQwh5wE7Ot3Sg0T7Xiu3Q9AgJVSZfYH/BC4bGIlx7Y45hXZXgnv/Zahf416zt6LLYDgUPGN8lAqI/rAmkg+XRvCZhzy9ahnCoVf6ZNRHQFuwFW1LbYCkmOVGhds1Dj9teSrFSt/YCLnkbJgrDrw3soc4tH4rO55uMDZEnk/qRCC40HMv019KgXtIvpYTMQj2sIaGvLKBv1OJ0OVJNNubAFLl5J4A7goYhWNPw5ggWPoqIgqUA19IHmHM1Dz0mdwxdcGqI6aai877uzGAoUGbaJPI5qg== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On Tue, May 20, 2025 at 9:45=E2=80=AFPM Lorenzo Stoakes wrote: > > On Tue, May 20, 2025 at 08:06:21PM +0800, Yafang Shao wrote: > > On Tue, May 20, 2025 at 5:49=E2=80=AFPM Lorenzo Stoakes > > wrote: > > > > > > On Tue, May 20, 2025 at 11:43:11AM +0200, David Hildenbrand wrote: > > > > > Conclusion > > > > > ---------- > > > > > > > > > > Introducing a new "bpf" mode for BPF-based per-task THP adjustmen= ts is the > > > > > most effective solution for our requirements. This approach repre= sents a > > > > > small but meaningful step toward making THP truly usable=E2=80=94= and manageable=E2=80=94in > > > > > production environments. > > > > A new "bpf" mode sounds way too special. > > > > > > > > We currently have: > > > > > > > > never -> never > > > > madvise -> MADV_HUGEPAGE, except PR_SET_THP_DISABLE > > > > always -> always, except PR_SET_THP_DISABLE and MADV_NOHUGEPAGE > > > > > > > > Whatever new mode we add, it should honor PR_SET_THP_DISABLE + > > > > MADV_NOHUGEPAGE. > > > > > > > > So, if we want another way to enable things, it would live between = "never" > > > > and "madvise". > > > > > > > > I'm wondering how we could make that generic: likely we want this n= ew > > > > mechanism to *not* be triggerable by the process itself (madvise). > > > > > > > > I am not convinced bpf is the answer here ... > > > > > > Agreed. > > > > > > I am also very concerned with us inserting BPF bits here - are we not= then > > > ensuring that we cannot in any way move towards a future where we > > > 'automagically' determine what to do? > > > > > > I don't know what is claimed about BPF, but it strikes me that we're > > > establishing a permanent uABI (uAPI?) if we do that and essentially > > > promising that THP will continue to operate in a fashion similar to h= ow it > > > does now. > > > > > > While BPF is a wonderful technology, I thik we have to be very very c= areful > > > about inserting it in places that consist of -implementation details-= that > > > we in mm already are planning to move away from. > > > > > > It's one thing adding BPF in the oomk (simple interface, unlikely to > > > change, doesn't really constrain us) or the scheduler (again the hook= s are > > > by nature reasonably stable), it's quite another sticking it in the h= eart > > > of a part of mm that is undergoing _constant_ change, partly as evide= nced > > > by the sheer number of series related to THP that are currently on-li= st. > > > > > > So while BPF may be the best solution for your needs _right now_, we = need > > > be concerned with how things affect the kernel in the future. > > > > > > I think we really do have to tread very carefully here. > > > > I totally agree with you that the key point here is how to define the > > API. As I replied to David, I believe we have two fundamental > > principles to adjust the THP policies: > > 1. Selective Benefit: Some tasks benefit from THP, while others do not. > > 2. Conditional Safety: THP allocation is safe under certain conditions > > but not others. > > > > Therefore, I believe we can define these APIs based on the established > > principles - everything else constitutes implementation details, even > > if core MM internals need to change. > > But if we're looking to make the concept of THP go away, we really need t= o > go further than this. > > The second we have 'bpf program that figures out whether THP should be > used' we are permanently tied to the idea of THP on/off being a thing. > > I mean any future stuff that makes THP more automagic will probably invol= ve > having new modes for the legacy THP > /sys/kernel/mm/transparent_hugepage/enabled and > /sys/kernel/mm/transparent_hugepage/hugepages-xxkB/enabled > > But if people are super reliant on this stuff it's potentially really > limiting. > > I think you said in another post here that you were toying with the notio= n > of exposing somehow the madvise() interface and having that be the 'stabl= e > API' of sorts? Yes, I have a BPF program that hooks into madvise() to selectively enforce THP policies=E2=80=94allowing it for certain tasks while blocking i= t for others. However, this violates the semantic guarantee of madvise(). For instance, if a user sees THP configured in madvise mode, they=E2=80=99d expect madvise() to reliably enable it. But with this = BPF logic, such calls might silently fail, creating inconsistency. This is why we propose introducing a dedicated BPF-controlled mode, or alternatively extending the semantics of the existing "never" mode. --=20 Regards Yafang