From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 46288C3ABDD for ; Tue, 20 May 2025 12:07:02 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id BB5956B0083; Tue, 20 May 2025 08:07:01 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id B8D1A6B0085; Tue, 20 May 2025 08:07:01 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id ACA4A6B0088; Tue, 20 May 2025 08:07:01 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0012.hostedemail.com [216.40.44.12]) by kanga.kvack.org (Postfix) with ESMTP id 905EA6B0083 for ; Tue, 20 May 2025 08:07:01 -0400 (EDT) Received: from smtpin19.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay04.hostedemail.com (Postfix) with ESMTP id DDF031A0E65 for ; Tue, 20 May 2025 12:07:00 +0000 (UTC) X-FDA: 83463160200.19.503AF2C Received: from mail-qv1-f53.google.com (mail-qv1-f53.google.com [209.85.219.53]) by imf28.hostedemail.com (Postfix) with ESMTP id 02967C000E for ; Tue, 20 May 2025 12:06:58 +0000 (UTC) Authentication-Results: imf28.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b=bRRCovmG; spf=pass (imf28.hostedemail.com: domain of laoar.shao@gmail.com designates 209.85.219.53 as permitted sender) smtp.mailfrom=laoar.shao@gmail.com; dmarc=pass (policy=none) header.from=gmail.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1747742819; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=uawf6IInxlFD5XQ+Zo8gfn/3GyjOudh7Roi/AjAtF3A=; b=VkVR6vZGfJck8eecilcOyyPtFr4xRL7lYVv05XXr6mSeN2fUhBF1LL8IVEfaJJcKXkeMKM 78DTVEKdSqY8Q9BjhK4PuQFlLECgYfIVZ/oLLeoHvRejmTAM63yInS3GF3YXDlfi8wGivX 0TBjbrc4kQqdbqt31JuX2xomFOKDX4M= ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1747742819; a=rsa-sha256; cv=none; b=y9WfelOu++6lo5fDv4lDXVFx3TbCWlvUJsCeGmTWI+k4rP1MGsf1lWg8fY3nc7BsB2gwv/ mD44PlfqU9MP4Y+WACjcrHTbC27P8MEccH9ucECb8mq5yVIf3eW0O6V9l6fPQcZxda4FWy l8ONuB4+UbTFZi5nHlXiqUOl9SwvcyY= ARC-Authentication-Results: i=1; imf28.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b=bRRCovmG; spf=pass (imf28.hostedemail.com: domain of laoar.shao@gmail.com designates 209.85.219.53 as permitted sender) smtp.mailfrom=laoar.shao@gmail.com; dmarc=pass (policy=none) header.from=gmail.com Received: by mail-qv1-f53.google.com with SMTP id 6a1803df08f44-6f8c46455f0so33627186d6.1 for ; Tue, 20 May 2025 05:06:58 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1747742818; x=1748347618; darn=kvack.org; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=uawf6IInxlFD5XQ+Zo8gfn/3GyjOudh7Roi/AjAtF3A=; b=bRRCovmGuiOf1BzsvKiH7S9afXP4ok1goPncuCBuaubJV6EDAV6Rzj+cnGy1RgVBBZ vd49luISxE9j+oXEUcSkirsUT0nOfkVOpq3/Tj/arhISQ3BuP7s40LDdVPe6F+wOuZey w3UmQKH8OKHB/c2dToHDkYP2+LW+nwVKjYHGku6yBVzuSAx0//+6CN1vPVNOvWvN3ojo yCfQsPc+pgYrGYSmnbJ/u3EEHz6vLl9cLnRap0gDIhbbhKGlMTnBwtbeY8kg5rE7eq23 P2X6jWK/HFvy5aD7nNrmEiglrnSTAJGyo9ljLaLk43LV0AV4bfYOLbOYvAKNNoh+JYEC PPFg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1747742818; x=1748347618; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=uawf6IInxlFD5XQ+Zo8gfn/3GyjOudh7Roi/AjAtF3A=; b=qZU0KEtlmLPzP2lGokqQqs/xnxRZAFeNeqv//Q3IRP09CNjw+z5HILPOlDjI63fdfJ QNRztDQqPiO0ML5AXHKl1MMr3wLSW5nGYdZTzdiRspq+jdxllIKgbTM+e45tox/bwZIH qRxZ+UhgaIAr3k8tfMVer4zdjbO0NqZUljxxdXVufeqPPIMxm3pP3R/3tkaakSzHAljD kXBW7wDRQs3xMbALVFnwr9bcrLxaRMA/Bu1A1txDx0AtiRdRouIkjvvGlqelql+orYlR OyT9t5sHnsD74501zulvmnNprKpTHXnep+c922Dul6d80nkN0PMNx7LGU2E1vJEskiSp Utgg== X-Forwarded-Encrypted: i=1; AJvYcCUDe7fgbldaBR876pkQZ6HaILqpu+//0SmkksMTl9IDUmtjeGWcCE9ucbPjZ+ssNd19Npd/yOHIpw==@kvack.org X-Gm-Message-State: AOJu0YyYzZCpWuKAughizSqUVa3gR8iylaSLrGmxhy/CKRLKpqDFY1aq QxX8/Hn9TCQEfxSIH9t36c/zieT4OnyxygqZxqbf2A0qhGf36l7+umE4vJK2/X7ZJ2NqfKdbsVR GYAA+7zqODBACWGeu4t2TwA3HnFLNR6U= X-Gm-Gg: ASbGncsEtekDl/elKWs4mOPX2RsZTAMRbDgpa1zE8EKEdRSaz7A9S3I8TITPsZ3BABw jYTNjEdc0wdDC3D3/0Bpjvhg8BuDCVjZYvpnEFMzKUssIjh8iNiHNFce7zRpQvKlJ6thpJ1bOrc AoJXSd4EJUEzF29BbckaZPxzOVdxNImNHcTA== X-Google-Smtp-Source: AGHT+IHZnzQTk5sIW+u0v7W/D6wS+iYmmxrSh08v6I6mL1fjQBiVJA8AJwFzsvVM2LKUBBYjYIseb2UEzQp7zmo1aWw= X-Received: by 2002:ad4:5748:0:b0:6f8:d14a:f793 with SMTP id 6a1803df08f44-6f8d14af9demr136091396d6.21.1747742817800; Tue, 20 May 2025 05:06:57 -0700 (PDT) MIME-Version: 1.0 References: <20250520060504.20251-1-laoar.shao@gmail.com> <746e8123-2332-41c8-851b-787cb8c144a1@redhat.com> In-Reply-To: From: Yafang Shao Date: Tue, 20 May 2025 20:06:21 +0800 X-Gm-Features: AX0GCFu4_c7f2wCIxPd-Als2KQ2nxgmg7i43lBUhtR4539oEk2sI1xpn5z-IGNE Message-ID: Subject: Re: [RFC PATCH v2 0/5] mm, bpf: BPF based THP adjustment To: Lorenzo Stoakes Cc: David Hildenbrand , akpm@linux-foundation.org, ziy@nvidia.com, baolin.wang@linux.alibaba.com, Liam.Howlett@oracle.com, npache@redhat.com, ryan.roberts@arm.com, dev.jain@arm.com, hannes@cmpxchg.org, usamaarif642@gmail.com, gutierrez.asier@huawei-partners.com, willy@infradead.org, ast@kernel.org, daniel@iogearbox.net, andrii@kernel.org, bpf@vger.kernel.org, linux-mm@kvack.org Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Stat-Signature: a1x4j8gcidnqxj64r1wh97e96sm9adm5 X-Rspamd-Queue-Id: 02967C000E X-Rspam-User: X-Rspamd-Server: rspam02 X-HE-Tag: 1747742818-642934 X-HE-Meta: U2FsdGVkX18Bi2PGbtf4XHjltA411lBkdUZSjfFEPu+eTqmE+i68neh9CHURe7FkEdKNSVvGZI4BDEjsnMTP48/L7bSMVEHG4bJYrHRC7LK02De3AIRLRPVeIcwiuhpUbXjcjFj5ZZlKaRk/h6O7TiFtqterQK2m+UvsEoc3aGnJeYbY9ScHRB/Rl8ym+ER7KDSmTjOcDmOG0QrvyRKKIV7rMazouYHnrqUt/nbQDjxB0kw2tvYry58Jk1GCcBoKkc/orDQx1frhVLbxom4V+Bf5E9KWOmKRRelqo1Ys0auA+7najfo2ad73Mqi+q0MYxnKPA8WbNN3Pgq1Ki5NuhK1rcIcgmMpBeOihEu/N9MxnLHBiDtDqKdFKVXI1ijkyaWB2468GuZ4YnqBeF3jZcBoR4p/BubbScC/raupwXmgF9idAQSWU9fFYK20bviyI8ogx3owMQEa8k3YLrrw02221rsh50LiO/netnCmrx4aLtyH2ILSS6OWY6NABzO5Eag7DyfMUa8VJf5V6K97V68P6jUvkNkrG7P73uuoNdkPNnVafTRRVNFxUt9LAVplLjk9PtyYoxLOlVpG/ZpAp8GR7FbsNrm44uT8McfUStYl7ck7kMaYc6R3FZ1qSy1j87U6l/e5a6gnzyAIvL3ena4vAaxzmXVMVP21+w36zvC9thZ/1+rlv9JInA+AAn7NILsPXfRL1yol6z7e/uoQmxjgVHOMLes/MsBcZi18vfjlD1MjeLGdFGSBlgrUjAPryufXRFUf0WgGCQSps17Y0KE5S3oCbBjVF2uchnr39QdDg5dZh80vWAn0ILmuPf+QPpada1WcjweegRgbXmMju8wFeOJs84PH/pPvPlZRmJ7o8PiR9ay/oSjlsFZKcbg6fhxTuacx1WvNGRuedAwzyVTKBxUvpZoXn6WhJWKY+7Iq1jpYhz6zp8266B2AVwZwOeCHwgCyJS+C5zXJ/dgj xrIng85U X1S54WCgRC51TCBzF4u2cT426Z2TKWaPFZ5WOxU0bfn6TExwVA0VPSnuXpL/709Sapd8suNhvc4kjV0UhR9s2QABD0btZiEuhg5WITd2FrCLU2ADK9AmZ8uq1Cm73haKv8r5d6bAtxcOy1l9OmDNAcD+a2np5NMaq7hV+YpMj8wECUsdlA41OMwGC/pXSB4W6DRXG0IMrXTKXxJY78WCbe2UTWn1Hw1a1AQM4dkybslU8npuvsBPRCdzSR5799PXrx+3i2nNHqhx9vGdME0s9dy4AadXoZEWtcjY6SFbLcCEVKSfjzX1W/F+XGpvGbIlosmGuHeuEV8ouuH6aNuH73aWuP66EOcsxRUVsW4B+fKAGox8bGRIejsrlzZan/nIt/MlSs4A8qVGTrTZvK+FOYG+fLNi5wS4lZG4v X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On Tue, May 20, 2025 at 5:49=E2=80=AFPM Lorenzo Stoakes wrote: > > On Tue, May 20, 2025 at 11:43:11AM +0200, David Hildenbrand wrote: > > > Conclusion > > > ---------- > > > > > > Introducing a new "bpf" mode for BPF-based per-task THP adjustments i= s the > > > most effective solution for our requirements. This approach represent= s a > > > small but meaningful step toward making THP truly usable=E2=80=94and = manageable=E2=80=94in > > > production environments. > > A new "bpf" mode sounds way too special. > > > > We currently have: > > > > never -> never > > madvise -> MADV_HUGEPAGE, except PR_SET_THP_DISABLE > > always -> always, except PR_SET_THP_DISABLE and MADV_NOHUGEPAGE > > > > Whatever new mode we add, it should honor PR_SET_THP_DISABLE + > > MADV_NOHUGEPAGE. > > > > So, if we want another way to enable things, it would live between "nev= er" > > and "madvise". > > > > I'm wondering how we could make that generic: likely we want this new > > mechanism to *not* be triggerable by the process itself (madvise). > > > > I am not convinced bpf is the answer here ... > > Agreed. > > I am also very concerned with us inserting BPF bits here - are we not the= n > ensuring that we cannot in any way move towards a future where we > 'automagically' determine what to do? > > I don't know what is claimed about BPF, but it strikes me that we're > establishing a permanent uABI (uAPI?) if we do that and essentially > promising that THP will continue to operate in a fashion similar to how i= t > does now. > > While BPF is a wonderful technology, I thik we have to be very very caref= ul > about inserting it in places that consist of -implementation details- tha= t > we in mm already are planning to move away from. > > It's one thing adding BPF in the oomk (simple interface, unlikely to > change, doesn't really constrain us) or the scheduler (again the hooks ar= e > by nature reasonably stable), it's quite another sticking it in the heart > of a part of mm that is undergoing _constant_ change, partly as evidenced > by the sheer number of series related to THP that are currently on-list. > > So while BPF may be the best solution for your needs _right now_, we need > be concerned with how things affect the kernel in the future. > > I think we really do have to tread very carefully here. I totally agree with you that the key point here is how to define the API. As I replied to David, I believe we have two fundamental principles to adjust the THP policies: 1. Selective Benefit: Some tasks benefit from THP, while others do not. 2. Conditional Safety: THP allocation is safe under certain conditions but not others. Therefore, I believe we can define these APIs based on the established principles - everything else constitutes implementation details, even if core MM internals need to change. --=20 Regards Yafang