From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 475DCC83F34 for ; Sun, 20 Jul 2025 03:08:12 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id BCFA36B009D; Sat, 19 Jul 2025 23:08:11 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id B801F6B009E; Sat, 19 Jul 2025 23:08:11 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id A486E6B009F; Sat, 19 Jul 2025 23:08:11 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0010.hostedemail.com [216.40.44.10]) by kanga.kvack.org (Postfix) with ESMTP id 912C66B009D for ; Sat, 19 Jul 2025 23:08:11 -0400 (EDT) Received: from smtpin04.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay08.hostedemail.com (Postfix) with ESMTP id 2B98A140446 for ; Sun, 20 Jul 2025 03:08:11 +0000 (UTC) X-FDA: 83683159182.04.90A2E44 Received: from mail-qv1-f51.google.com (mail-qv1-f51.google.com [209.85.219.51]) by imf06.hostedemail.com (Postfix) with ESMTP id 48F9F180003 for ; Sun, 20 Jul 2025 03:08:09 +0000 (UTC) Authentication-Results: imf06.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b="Ds/wsfQ8"; spf=pass (imf06.hostedemail.com: domain of laoar.shao@gmail.com designates 209.85.219.51 as permitted sender) smtp.mailfrom=laoar.shao@gmail.com; dmarc=pass (policy=none) header.from=gmail.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1752980889; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=M7YDzvyZ5EbOXLpo3at6ynRgR96TYB5L3Vxl1yzLIJ8=; b=T48Scsfyo2aWI6dKfIg8GrJnueeU+giDDfnl8BQZ7y1ZOgWhVm/uspGevrp6aCeNhGyqIV OyiqJsMBJUR/ZXAz1oJxsL6ezBU8qom9skFnHIdBoCyqrblEn/fDwjBMhBRubxJfUsWlKA cLKMDDiZk3/H14qM7ry2nCJf+ttYl4k= ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1752980889; a=rsa-sha256; cv=none; b=bAnWDqTb8rBCAvxIgr1yoNbwGUOmnJ9zwallxTPNwECT22TPxzju2jDCVw7Ifc707LpuzB yhcBr4UjYt6uz7Zp2vjC8nxnPHm4m/JdgG53Z8zHVTp3FBgpyDF3w+XeSC1+OjeY/gQgro xXwi1SpNs/DsYlWhRDtujxPg4UupOTY= ARC-Authentication-Results: i=1; imf06.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b="Ds/wsfQ8"; spf=pass (imf06.hostedemail.com: domain of laoar.shao@gmail.com designates 209.85.219.51 as permitted sender) smtp.mailfrom=laoar.shao@gmail.com; dmarc=pass (policy=none) header.from=gmail.com Received: by mail-qv1-f51.google.com with SMTP id 6a1803df08f44-6f8aa9e6ffdso34313226d6.3 for ; Sat, 19 Jul 2025 20:08:09 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1752980888; x=1753585688; darn=kvack.org; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=M7YDzvyZ5EbOXLpo3at6ynRgR96TYB5L3Vxl1yzLIJ8=; b=Ds/wsfQ80mH49CrQIYsYpX3WW7yMqshzd4ZwXk4llBDfHLl5KqryFhd9aqc9D7t7ha n0a2J8nVTV2uWck4P9hQUIpmhC6sYIUbuXK8LFUVYNAmqzosAjnpzHwCXNu5lTNesM/a GudhO3Z1TxqgNrwTtrhhEBAfGBMzPIS2Yk+EZAkObLX57yygcrP70eBUalLqtmE7TwvZ +gKiuowO3FHKYIIRX31qGlRtM02UAZCm/CnzXg8nAbRSRhOAgtRkgamujafvwOcM/Nrw TRkMHC3Vse2jytdp9FRvwGMKejnU7OyFRMdII12xKMZR98HKuqO5XoUqwrUDFAocVng4 m6vg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1752980888; x=1753585688; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=M7YDzvyZ5EbOXLpo3at6ynRgR96TYB5L3Vxl1yzLIJ8=; b=PpQ29LNAOEko6oSuoiDryjYYIGTey1lTYYN21iMH1vTQ9GRj3PSlbUdgpJTNb6KEcU 8ifkgVZvumLzAjqbfogdm4k210Qp1vBJ0zy3WVQcZXvgdpMgW7vDLrTbWBFAlwc332ZH 7/ORdOWTZGqsU874YVx0EYjFEW/eYMBjV8tlDKlGgSXqcFL22qsSgYTkCo//qsn0+svO H4xNPVzMiovADqKGlM0WwTOL1xwLlbiKUuCTmSz9qFoG0xVOozvathOGa5fa6eoLuJKw C+9H7JHJSLxatyVf5KrBok+pN0IBV47uSvzJGHLB+p4gRUbNl7AQIrGZ8PMQXa8CsUfH bDvw== X-Forwarded-Encrypted: i=1; AJvYcCVE4QC/TD8EwQZPesRcLON4DlvlhMBVNrwyVfPhq3njKIyCr0hPQWLGht0nQOFyjJPgrQ4AMGWJ8w==@kvack.org X-Gm-Message-State: AOJu0YyIaJYBdk8zQVFv/s1ah0MW8/qgQgFAzhO4+XNczE3SFLYhdygc Har2WGf7rv3cfN04us5SvjKO4yPcVMTFQya58QbZTxYf0fNMnO9XZc6voRG3YLZeSbFEljGjXVp te2/RsYbHtmaUvAChH9cqlR13dVlhv5M= X-Gm-Gg: ASbGncuf3MBAPnD8aE+hheAZ91CCJI4U5e1GNU6zEG7k97ux6Q501A1+m98EWg6Rk3I EEuaRnFutPW3hZzu6mInUm2nOiSI8iOz0dLvU+pHp3D4rtqbJfxf5616AWoZ6ivODce9MdZBNHz heF5h6a+HIEdu9IYtXyi984P+TJw3KZAc7C9DudXrpkplYmdySq5xoVtWYJB9Bp0P0PC37mSUFv ABkiY7c X-Google-Smtp-Source: AGHT+IGarZ9zKueZnuZ07/wlCjNSFFxXtEJtCHq1d/Ev7fFF0Q8WWGnyqL8zFNfbuPKzkSkzwj3M46NErLqaW7swzZg= X-Received: by 2002:a05:6214:801c:b0:705:16d9:16d8 with SMTP id 6a1803df08f44-70516d9181emr96351766d6.6.1752980888282; Sat, 19 Jul 2025 20:08:08 -0700 (PDT) MIME-Version: 1.0 References: <20250608073516.22415-1-laoar.shao@gmail.com> <20250608073516.22415-5-laoar.shao@gmail.com> In-Reply-To: From: Yafang Shao Date: Sun, 20 Jul 2025 11:07:32 +0800 X-Gm-Features: Ac12FXyRAfYlda0oZ33NtJkYwFeQfBWm3JyUCjQHjv0g9f2Lbj5t4lwJWlDZr28 Message-ID: Subject: Re: [RFC PATCH v3 4/5] mm: thp: add bpf thp struct ops To: Amery Hung Cc: akpm@linux-foundation.org, david@redhat.com, ziy@nvidia.com, baolin.wang@linux.alibaba.com, lorenzo.stoakes@oracle.com, Liam.Howlett@oracle.com, npache@redhat.com, ryan.roberts@arm.com, dev.jain@arm.com, hannes@cmpxchg.org, usamaarif642@gmail.com, gutierrez.asier@huawei-partners.com, willy@infradead.org, ast@kernel.org, daniel@iogearbox.net, andrii@kernel.org, bpf@vger.kernel.org, linux-mm@kvack.org Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Rspamd-Server: rspam03 X-Rspamd-Queue-Id: 48F9F180003 X-Stat-Signature: 94trnfexgcut3z6wg7cp9u5434ud11f4 X-Rspam-User: X-HE-Tag: 1752980889-619965 X-HE-Meta: U2FsdGVkX18rXoCuZOBJVcdhZ4sFhpKEGwNMg5WQ92/3ubdW6KKdLqTT0seGDtyDK+U19uOiDl3lSwDA7jlGQ/qTxXDVV6XV+IuIfn0bjW9qHGCvoelni7YEpQXKdDAmUeMjts9ki+80DAycVV6nBE0WG26WqBmugBqIUzbT64iB5rHMJbnVyL7D6BWIjtWL6iZw/wrNmv12GTc4K8+PIqKCNiQU0k1BtdjCAIKNMnqbf2qiPTkestn5EdVhHeJNEHUYKbj/6nQS6vMar6G/1qHQmEpVy9ydnE8pHTGJMafPGap4N0OzDRB+Wa2nnuF+4bCnVCWB8Xt8pZOu7cvyVDzUS26kWBFAb6rc8cDcZlpobQlLymqVdCCWd3jPxRMIk/3532ENC8HHu0tu39XrgzHPbeH0+j1Rp79QXXERQ686aAdJuZEDyWO706eLTUxfSHCk+3QowZ8aYtbZFpkVBPFXUfNgMorjWv/AvWQrpNW/YtV3kOs9LZKgQuUKNB+yZHRKakzvbVBp7PkCQzv0rmPqf+/Ti1+WOKAj4i3P3oHtiA5J+jqS93GnAv8VWP21hwt+myDSiqP9UhmXOU/oIZcwPw4R2jFxPL8CFzUneniXp8S45ymsokfYiJcjUzNF760EKiUQ765ZPLLj+1VOd84Q1FUg10Uesp1JJpJjQ+rJF6VkdQjr9OS9/klZKUeu4iW/1KzpSK45wG7D5rbNWf+sPOauie6bvhRuVEfO9bxzGZRaL4EBqm5GN0332ABtEmIyHFOz8+YlSkK+2iw8knaHhtO5Hu7lGQHwI5u1tj7X3c52RO8Uxjppou27meDWtei3XSarg+r3cQ6oEQw71Pbkytf8p+JFKuWX2lmjdFfL5vEf2oOdVw1jzhDMbvqegDP4K9+PRgBFauJZyVovdF70k7JbyZfv7gf+ZFr5sHFXQNfsgFZ/4KTkdJ3nDLjCoHNbquyhrapN7eMqbtm V3SKmIlq ixAOguXgiHKuB99ogFggKF5Rn7jvmhIfXjfAPKVKZTN8lesa1UwypiE28dcReyRIQ+mJ66aCpL5jiSKfiC9MukarzGNtVn5mWzMUObdcOgo7simLRHWwYz7BmLRP5wWIoYoG6M9w8LcSHLt51VvtafGVD5wtjY2y//bsNsHHzJKlSoJtczuRsqv+I1J0qoq3QSTURf6XD7HwDCt2z1kp/QAQVoeZPNEvfeH4qpZ4gVjjdtxAEwvlaBQz004JZfZgfu5fSDOCK3KQ1zn1dudHAfWtGMOIW/KlJZABgOUBpTs4y55c7MJqErQ/Ms/hwrkNbzs2Vck8kh40YKZvfEQ9Wc842O5ai+KfT5EkucmcXlM7CzjGZTvGPGt5U45sAgWloGWZpR+RtBIPbQ4X8SoZzy2bZeRmzUx3sj9GZ5bF1vbz1AIxVibFgRHdkizmxzvvkpXLK X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On Fri, Jul 18, 2025 at 2:21=E2=80=AFAM Amery Hung wr= ote: > > > > On 6/8/25 12:35 AM, Yafang Shao wrote: > > A new bpf_thp struct ops is introduced to provide finer-grained control > > over THP allocation policy. The struct ops includes two APIs for > > determining the THP allocator and reclaimer behavior: > > > > - THP allocator > > > > int (*allocator)(unsigned long vm_flags, unsigned long tva_flags); > > > > The BPF program returns either THP_ALLOC_CURRENT or THP_ALLOC_KHUGEP= AGED, > > indicating whether THP allocation should be performed synchronously > > (current task) or asynchronously (khugepaged). > > > > The decision is based on the current task context, VMA flags, and TV= A > > flags. > > > > - THP reclaimer > > > > int (*reclaimer)(bool vma_madvised); > > > > The BPF program returns either RECLAIMER_CURRENT or RECLAIMER_KSWAPD= , > > determining whether memory reclamation is handled by the current tas= k or > > kswapd. > > > > The decision depends on the current task and VMA flags. > > > > Signed-off-by: Yafang Shao > > --- > > include/linux/huge_mm.h | 13 +-- > > mm/Makefile | 3 + > > mm/bpf_thp.c | 184 +++++++++++++++++++++++++++++++++++++++= + > > 3 files changed, 190 insertions(+), 10 deletions(-) > > create mode 100644 mm/bpf_thp.c > > > > diff --git a/include/linux/huge_mm.h b/include/linux/huge_mm.h > > index 6a40ebf25f5c..0d02c9b56a85 100644 > > --- a/include/linux/huge_mm.h > > +++ b/include/linux/huge_mm.h > > @@ -54,6 +54,7 @@ enum transparent_hugepage_flag { > > TRANSPARENT_HUGEPAGE_DEFRAG_REQ_MADV_FLAG, > > TRANSPARENT_HUGEPAGE_DEFRAG_KHUGEPAGED_FLAG, > > TRANSPARENT_HUGEPAGE_USE_ZERO_PAGE_FLAG, > > + TRANSPARENT_HUGEPAGE_BPF_ATTACHED, /* BPF prog is attached *= / > > }; > > > > struct kobject; > > @@ -192,16 +193,8 @@ static inline bool hugepage_global_always(void) > > > > #define THP_ALLOC_KHUGEPAGED (1 << 1) > > #define THP_ALLOC_CURRENT (1 << 2) > > -static inline int bpf_thp_allocator(unsigned long vm_flags, > > - unsigned long tva_flags) > > -{ > > - return THP_ALLOC_KHUGEPAGED | THP_ALLOC_CURRENT; > > -} > > - > > -static inline gfp_t bpf_thp_gfp_mask(bool vma_madvised) > > -{ > > - return 0; > > -} > > +int bpf_thp_allocator(unsigned long vm_flags, unsigned long tva_flags)= ; > > +gfp_t bpf_thp_gfp_mask(bool vma_madvised); > > > > static inline int highest_order(unsigned long orders) > > { > > diff --git a/mm/Makefile b/mm/Makefile > > index 1a7a11d4933d..e5f41cf3fd61 100644 > > --- a/mm/Makefile > > +++ b/mm/Makefile > > @@ -99,6 +99,9 @@ obj-$(CONFIG_MIGRATION) +=3D migrate.o > > obj-$(CONFIG_NUMA) +=3D memory-tiers.o > > obj-$(CONFIG_DEVICE_MIGRATION) +=3D migrate_device.o > > obj-$(CONFIG_TRANSPARENT_HUGEPAGE) +=3D huge_memory.o khugepaged.o > > +ifdef CONFIG_BPF_SYSCALL > > +obj-$(CONFIG_TRANSPARENT_HUGEPAGE) +=3D bpf_thp.o > > +endif > > obj-$(CONFIG_PAGE_COUNTER) +=3D page_counter.o > > obj-$(CONFIG_MEMCG_V1) +=3D memcontrol-v1.o > > obj-$(CONFIG_MEMCG) +=3D memcontrol.o vmpressure.o > > diff --git a/mm/bpf_thp.c b/mm/bpf_thp.c > > new file mode 100644 > > index 000000000000..894d6cb93107 > > --- /dev/null > > +++ b/mm/bpf_thp.c > > @@ -0,0 +1,184 @@ > > +// SPDX-License-Identifier: GPL-2.0 > > + > > +#include > > +#include > > +#include > > +#include > > + > > +#define RECLAIMER_CURRENT (1 << 1) > > +#define RECLAIMER_KSWAPD (1 << 2) > > +#define RECLAIMER_BOTH (RECLAIMER_CURRENT | RECLAIMER_KSWAPD) > > + > > +struct bpf_thp_ops { > > + /** > > + * @allocator: Specifies whether the THP allocation is performed > > + * by the current task or by khugepaged. > > + * @vm_flags: Flags for the VMA in the current allocation context > > + * @tva_flags: Flags for the TVA in the current allocation contex= t > > + * > > + * Rerurn: > > + * - THP_ALLOC_CURRENT: THP was allocated synchronously by the ca= lling > > + * task's context. > > + * - THP_ALLOC_KHUGEPAGED: THP was allocated asynchronously by th= e > > + * khugepaged kernel thread. > > + * - 0: THP allocation is disallowed in the current context. > > + */ > > + int (*allocator)(unsigned long vm_flags, unsigned long tva_flags)= ; > > + /** > > + * @reclaimer: Specifies the entity performing page reclaim: > > + * - current task context > > + * - kswapd > > + * - none (no reclaim) > > + * @vma_madvised: MADV flags for this VMA (e.g., MADV_HUGEPAGE, M= ADV_NOHUGEPAGE) > > + * > > + * Return: > > + * - RECLAIMER_CURRENT: Direct reclaim by the current task if THP > > + * allocation fails. > > + * - RECLAIMER_KSWAPD: Wake kswapd to reclaim memory if THP alloc= ation fails. > > + * - RECLAIMER_ALL: Both current and kswapd will perform the recl= aim > > + * - 0: No reclaim will be attempted. > > + */ > > + int (*reclaimer)(bool vma_madvised); > > +}; > > + > > +static struct bpf_thp_ops bpf_thp; > > + > > +int bpf_thp_allocator(unsigned long vm_flags, unsigned long tva_flags) > > +{ > > + int allocator; > > + > > + /* No BPF program is attached */ > > + if (!(transparent_hugepage_flags & (1< > + return THP_ALLOC_KHUGEPAGED | THP_ALLOC_CURRENT; > > + > > + if (current_is_khugepaged()) > > + return THP_ALLOC_KHUGEPAGED | THP_ALLOC_CURRENT; > > + if (!bpf_thp.allocator) > > + return THP_ALLOC_KHUGEPAGED | THP_ALLOC_CURRENT; > > + > > + allocator =3D bpf_thp.allocator(vm_flags, tva_flags); > > + if (!allocator) > > + return 0; > > The check seems redundant. Is it? Right, thanks for pointing it out. > > > + /* invalid return value */ > > + if (allocator & ~(THP_ALLOC_KHUGEPAGED | THP_ALLOC_CURRENT)) > > + return THP_ALLOC_KHUGEPAGED | THP_ALLOC_CURRENT; > > + return allocator; > > +} > > + > > +gfp_t bpf_thp_gfp_mask(bool vma_madvised) > > +{ > > + int reclaimer; > > + > > + if (!(transparent_hugepage_flags & (1< > + return 0; > > + > > + if (!bpf_thp.reclaimer) > > + return 0; > > + > > + reclaimer =3D bpf_thp.reclaimer(vma_madvised); > > + switch (reclaimer) { > > + case RECLAIMER_CURRENT: > > + return GFP_TRANSHUGE | __GFP_NORETRY; > > + case RECLAIMER_KSWAPD: > > + return GFP_TRANSHUGE_LIGHT | __GFP_KSWAPD_RECLAIM; > > + case RECLAIMER_BOTH: > > + return GFP_TRANSHUGE | __GFP_KSWAPD_RECLAIM | __GFP_NORET= RY; > > + default: > > + return 0; > > + } > > +} > > + > > +static bool bpf_thp_ops_is_valid_access(int off, int size, > > + enum bpf_access_type type, > > + const struct bpf_prog *prog, > > + struct bpf_insn_access_aux *info) > > +{ > > + return bpf_tracing_btf_ctx_access(off, size, type, prog, info); > > +} > > + > > +static const struct bpf_func_proto * > > +bpf_thp_get_func_proto(enum bpf_func_id func_id, const struct bpf_prog= *prog) > > +{ > > + return bpf_base_func_proto(func_id, prog); > > +} > > + > > +static const struct bpf_verifier_ops thp_bpf_verifier_ops =3D { > > + .get_func_proto =3D bpf_thp_get_func_proto, > > + .is_valid_access =3D bpf_thp_ops_is_valid_access, > > +}; > > + > > +static int bpf_thp_reg(void *kdata, struct bpf_link *link) > > +{ > > + struct bpf_thp_ops *ops =3D kdata; > > + > > + /* TODO: add support for multiple attaches */ > > + if (test_and_set_bit(TRANSPARENT_HUGEPAGE_BPF_ATTACHED, > > + &transparent_hugepage_flags)) > > + return -EOPNOTSUPP; > > I think returning -EBUSY if the struct_ops is already attached is a > better choice Makes sense. Thanks for the suggestion. > > > + bpf_thp.allocator =3D ops->allocator; > > + bpf_thp.reclaimer =3D ops->reclaimer; > > + return 0; > > +} > > + > > +static void bpf_thp_unreg(void *kdata, struct bpf_link *link) > > +{ > > + clear_bit(TRANSPARENT_HUGEPAGE_BPF_ATTACHED, &transparent_hugepag= e_flags); > > + bpf_thp.allocator =3D NULL; > > + bpf_thp.reclaimer =3D NULL; > > +} > > + > > +static int bpf_thp_check_member(const struct btf_type *t, > > + const struct btf_member *member, > > + const struct bpf_prog *prog) > > +{ > > + return 0; > > +} > > + > > [...] > > > +static int bpf_thp_init_member(const struct btf_type *t, > > + const struct btf_member *member, > > + void *kdata, const void *udata) > > +{ > > + return 0; > > +} > > + > > +static int bpf_thp_init(struct btf *btf) > > +{ > > + return 0; > > +} > > + > > +static int allocator(unsigned long vm_flags, unsigned long tva_flags) > > +{ > > + return 0; > > +} > > + > > +static int reclaimer(bool vma_madvised) > > +{ > > + return 0; > > +} > > + > > +static struct bpf_thp_ops __bpf_thp_ops =3D { > > + .allocator =3D allocator, > > + .reclaimer =3D reclaimer, > > +}; > > + > > +static struct bpf_struct_ops bpf_bpf_thp_ops =3D { > > + .verifier_ops =3D &thp_bpf_verifier_ops, > > + .init =3D bpf_thp_init, > > + .check_member =3D bpf_thp_check_member, > > nit. check_member doesn't need to be defined if it does not do anything. I will remove it. --=20 Regards Yafang