From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 3AF2BC021AA for ; Wed, 19 Feb 2025 02:38:40 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id BD0382801D4; Tue, 18 Feb 2025 21:38:39 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id B80CB2801D3; Tue, 18 Feb 2025 21:38:39 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id A48732801D4; Tue, 18 Feb 2025 21:38:39 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0016.hostedemail.com [216.40.44.16]) by kanga.kvack.org (Postfix) with ESMTP id 7F4D22801D3 for ; Tue, 18 Feb 2025 21:38:39 -0500 (EST) Received: from smtpin28.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay05.hostedemail.com (Postfix) with ESMTP id 382E14CFC9 for ; Wed, 19 Feb 2025 02:38:39 +0000 (UTC) X-FDA: 83135135958.28.474EC54 Received: from mail-wr1-f53.google.com (mail-wr1-f53.google.com [209.85.221.53]) by imf29.hostedemail.com (Postfix) with ESMTP id 4A13D120005 for ; Wed, 19 Feb 2025 02:38:37 +0000 (UTC) Authentication-Results: imf29.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b=KrvcGvy6; spf=pass (imf29.hostedemail.com: domain of alexei.starovoitov@gmail.com designates 209.85.221.53 as permitted sender) smtp.mailfrom=alexei.starovoitov@gmail.com; dmarc=pass (policy=none) header.from=gmail.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1739932717; a=rsa-sha256; cv=none; b=cJAeTtM8+lTEQn+THap/YHT6Pwaj84c1Ej9Vp7KiUhWRBqfcPx1J3hUiFi/NffcmouQ2LJ XxrMwpZrd3HgqzlkwSU2EdDdjCn9htU1AE0jQIiluIvIAuwYl4ijn1lpYKLd+98DnBqSUp CxvUo4sw+nhr+74t6ExD0WXSl5YWHqk= ARC-Authentication-Results: i=1; imf29.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b=KrvcGvy6; spf=pass (imf29.hostedemail.com: domain of alexei.starovoitov@gmail.com designates 209.85.221.53 as permitted sender) smtp.mailfrom=alexei.starovoitov@gmail.com; dmarc=pass (policy=none) header.from=gmail.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1739932717; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=2EgllTDpipZEfS9SM+0OuujBDZwwNbrL+Bh0aCSUA38=; b=GXSVkiwdr3xS9UGZWSmkFQsrXjraMOMW5UvUQ8nXiC9n4gHVQKMn/mJwjuoTjyDgayx7Gv vZ0wucBBCXYYkp6Q8v1MzpsjdOUduCGgv38Nj4gsKREfd2HcG+PDz6PyyUH3tQyyV82PX2 0f+R8m2jgF9Bb8v/NcrNazTh+WniGf0= Received: by mail-wr1-f53.google.com with SMTP id ffacd0b85a97d-38f2b7ce319so3837989f8f.2 for ; Tue, 18 Feb 2025 18:38:36 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1739932716; x=1740537516; darn=kvack.org; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=2EgllTDpipZEfS9SM+0OuujBDZwwNbrL+Bh0aCSUA38=; b=KrvcGvy6+iT/zlgOWT9OMyKiLpx/6osBMlf6+rKeU0IVg9swxL04bfPvh0UvS3r3iQ ggaTRGR0rpdjBpLaa++5hjJCVbfCN9pCEeviwQi1qSQkVx6/E1o9iBlReT3H9+Tblqy5 J759hM41v4Sx2blZCtWKmsmraItKyvdDLVgDTPPb4zv7bEeYG+321sZkd4LcXwGNsyGR PLm7tgx47Fc8yyyrrIqzcdCYoYAD3SLY4oCZMo8sKphVGpDLdnDth4sTTCNoMM6dbJ11 Ez33KWS6Qm0TkvJgF+d4Oqk3vnlpmZi9q/oGnis4Gh6HSrktpHR4jUt7J/26Pz/ySrTA HgUg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1739932716; x=1740537516; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=2EgllTDpipZEfS9SM+0OuujBDZwwNbrL+Bh0aCSUA38=; b=KP8+CD7IxvjdfRAVa5eKH7+Ox0BhX3JLaa0CqeG7iDfihSBPIuumH+VTOUdx83Dd8+ Mk2FBFxpT8GJapy9JL29ef3YsB0hvR2MU2LOAeYJRAyS1tompzHKXKY/BZA463CcFUhj HsTGEe1+ESJ2t1N0yYzkwvFJeuhzhwa34KAlNJY+ozOSsgUq0UpZqLqsHg/vM4lS0Pce aFpIr7tmOvBVZ/NIoc32AB4ltV4DWCllUZBJwQZ3RYn9PNNd1wTsDdYT+ok1X4KiD9eb JG94buvtcOKKN3duxpaw3j22WiPOlLLGmx7EeyRlyU5UpIx/QG5kJfmI5T6AKDdt+uIx vDjg== X-Forwarded-Encrypted: i=1; AJvYcCXrkE6+5yEjfqooUCvr9jMFc3O27Gz/OuA+vdWYG58db3VeVkR4lQZ4j2CQd0rzXXk+zj7asBcFsw==@kvack.org X-Gm-Message-State: AOJu0YzGH5AaRgGD+DIDBicUr8lEM59EZiPjxt6c4nMcqPQBCCkuH1yu G7bVeL6fJIMw/p4HHGltYcoe4nEZDis/shDHzX5IOeURb1NTPFBZs+TRT7dVm36qtUKM0GJknxv GpUEcaMmZ/DasmAB+B0bIO/+aGvY= X-Gm-Gg: ASbGnctI3EuoiraemzQO+PKIfB/EGXJvlO4RT4zrbVJYdq88VyGkD3VPLLBsuPtQF5/ tCDPUM+Q6ZT3W6yfNUKEzF6QHkOW/ZIe9F1ELEfWubZlmg/7XjyLPRMZmkgkUGdpjQyY73l6PJ1 aV7eUJ029kqYFo5y/jB7/SBM8KKrPn X-Google-Smtp-Source: AGHT+IE1e3IGdpojVgRJx4aBseG+oKtfS10LKyB6mebmTq5Z373JrWCdck971nLXJR4iuBI39p/PuNFCAOFUCgX3J4Q= X-Received: by 2002:adf:e692:0:b0:38d:d666:5448 with SMTP id ffacd0b85a97d-38f587ca60amr996017f8f.40.1739932715400; Tue, 18 Feb 2025 18:38:35 -0800 (PST) MIME-Version: 1.0 References: <20250213033556.9534-1-alexei.starovoitov@gmail.com> <20250213033556.9534-7-alexei.starovoitov@gmail.com> In-Reply-To: From: Alexei Starovoitov Date: Tue, 18 Feb 2025 18:38:24 -0800 X-Gm-Features: AWEUYZnIijcvPq3Fw__j_ohq7gvM8u2aZXwBnP20B_N_e5DeEnwpu3YfptKpazA Message-ID: Subject: Re: [PATCH bpf-next v8 6/6] bpf: Use try_alloc_pages() to allocate pages for bpf needs. To: Vlastimil Babka Cc: bpf , Andrii Nakryiko , Kumar Kartikeya Dwivedi , Andrew Morton , Peter Zijlstra , Sebastian Sewior , Steven Rostedt , Hou Tao , Johannes Weiner , Shakeel Butt , Michal Hocko , Matthew Wilcox , Thomas Gleixner , Jann Horn , Tejun Heo , linux-mm , Kernel Team Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Stat-Signature: xhnj4f5ayowbz5ssnymznupefdpss537 X-Rspamd-Server: rspam06 X-Rspamd-Queue-Id: 4A13D120005 X-Rspam-User: X-HE-Tag: 1739932717-27139 X-HE-Meta: U2FsdGVkX1+JKR7UzwOEfFzTpMhJiP42fKZeoz4o4BpevHYXvu1xoAGXTVrU5DrXWeHK/Ylm1Q7eh/gtGT4WGTbz1alj4QNz7vTV3Rv5mUGoCLoj3v+P6axdwf55T4h0z9IBKrIgw3a8ehSivKWZM4QI1rQHhs0K7wzKNJyxOK2q7nF4ZZq92S0lZZ+l256pG9l1RaIssbnxwLNHKRnpG3YuXwj88KwR3EF4BJ7S6lwjSru348k7t6U4+xZgQzTkA6es55mGOhuCes7DdWr7T/wthSb32KlSjIWNZy5rYolznfhpTtXuF+EppBHhQCOvmCy9+qjUIvdRgbCYcyE/zZSO/8XAmBfUpRoD+hpHO9sdmKqOYKfkZ+r+gkSSfCR2zpV5Ee9xv0YbuG5ignW/UDj9vXiOAUlKAz+Ek28sk67p3Ftqv90DYC27C278EOl46GD3fRjhIkYrBk4A1somMnyOJ69JwTJO8XQqlnx3mXbOJZGKK7nccUIbRnSLKyoW3TjSwq4dnArey1SBzQhE9Tgy2r2aAXVtIE523SPEwARfeMw+BL4TX95Gvhxx/C1aDf8GVX+I4aQVTMXvFkJ2GmJaDAYJ/8fPK5kl4zb1NPvHBLF2f4+hj0bjX5lrlWgkP+g8cMAHm1EyH/qJf4wZNLZZb+2I6bXVWQX649/kFTtXSokwfMqETPnn3orw2VVueiE4woZZLfxM4eUpNjoSsGga2yZX98C6XUN0KEi3hAybTMZSDjHtNCnggxrbXB3Tgx0nrjE6Rxc8RIcyDAypAX0LHXLIxlpqz/ri1jsjXk3II/6Y1mw84v3xxY6gkAO70WB5OfpMigYFIst3ZK5h0f8ENuhqLczYff92yfSEexUPENEvuy8C4mczM1kpWYY+gYTXTsIaGmfBIXsyaFEVBk87fFp0scN4rPFCIz8j7o3yeP7EATgt7HSmiEt+CSPjU8XY2ysQ8wJqelMnLLH k/zT7t9Z jQ2Max/qc+2fThGQqWZE26J/pm3GHyzmfYo/jOGG5s8hxlcrWOYumyPIQ2WQgdvnwfleJIeiT5VsKuAvfXNnH4+srJ24T7ALgH+8AzGBpJw67gLFyixEutajA8r76iS3ZeI5wJnU5Hq8jDw/khLwruKtiUmqXtoHKmKT68FkWf2wV0B3mBR9RdQyOlphQkLo/+ZzhuPK2NxX1eG7My4lKL+qEyTpYjIrCbAe7MfBpT86My7Ym9AVESTIHzF+RpfR2cJnj/UGkfT//Mvi7RYpLgpPcWytvSV4rrXGWkw8oX8ARQIy9QlXxOYBcmjDUFb0UxRBzFpSYhRsoHVnq6ky6DDOWFcYRWgWYF+WLwoWwPdH5eb0R93vKE8iO0nRpR8PlGXFYjsICmutekTZEvKV5gPFdMhTsgF/9Oi+GWIuHM9h9S/UIjmu1vWPsklS/hPDcqpaFT0PSDp8CVSVIrxTC/684rv4DnyQpbAt33TfNoNH96Ko= X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On Tue, Feb 18, 2025 at 7:36=E2=80=AFAM Vlastimil Babka wr= ote: > > On 2/13/25 04:35, Alexei Starovoitov wrote: > > From: Alexei Starovoitov > > > > Use try_alloc_pages() and free_pages_nolock() for BPF needs > > when context doesn't allow using normal alloc_pages. > > This is a prerequisite for further work. > > > > Signed-off-by: Alexei Starovoitov > > --- > > include/linux/bpf.h | 2 +- > > kernel/bpf/arena.c | 5 ++--- > > kernel/bpf/syscall.c | 23 ++++++++++++++++++++--- > > 3 files changed, 23 insertions(+), 7 deletions(-) > > > > diff --git a/include/linux/bpf.h b/include/linux/bpf.h > > index f3f50e29d639..e1838a341817 100644 > > --- a/include/linux/bpf.h > > +++ b/include/linux/bpf.h > > @@ -2348,7 +2348,7 @@ int generic_map_delete_batch(struct bpf_map *map= , > > struct bpf_map *bpf_map_get_curr_or_next(u32 *id); > > struct bpf_prog *bpf_prog_get_curr_or_next(u32 *id); > > > > -int bpf_map_alloc_pages(const struct bpf_map *map, gfp_t gfp, int nid, > > +int bpf_map_alloc_pages(const struct bpf_map *map, int nid, > > unsigned long nr_pages, struct page **page_array)= ; > > #ifdef CONFIG_MEMCG > > void *bpf_map_kmalloc_node(const struct bpf_map *map, size_t size, gfp= _t flags, > > diff --git a/kernel/bpf/arena.c b/kernel/bpf/arena.c > > index 0975d7f22544..8ecc62e6b1a2 100644 > > --- a/kernel/bpf/arena.c > > +++ b/kernel/bpf/arena.c > > @@ -287,7 +287,7 @@ static vm_fault_t arena_vm_fault(struct vm_fault *v= mf) > > return VM_FAULT_SIGSEGV; > > > > /* Account into memcg of the process that created bpf_arena */ > > - ret =3D bpf_map_alloc_pages(map, GFP_KERNEL | __GFP_ZERO, NUMA_NO= _NODE, 1, &page); > > + ret =3D bpf_map_alloc_pages(map, NUMA_NO_NODE, 1, &page); > > if (ret) { > > range_tree_set(&arena->rt, vmf->pgoff, 1); > > return VM_FAULT_SIGSEGV; > > @@ -465,8 +465,7 @@ static long arena_alloc_pages(struct bpf_arena *are= na, long uaddr, long page_cnt > > if (ret) > > goto out_free_pages; > > > > - ret =3D bpf_map_alloc_pages(&arena->map, GFP_KERNEL | __GFP_ZERO, > > - node_id, page_cnt, pages); > > + ret =3D bpf_map_alloc_pages(&arena->map, node_id, page_cnt, pages= ); > > if (ret) > > goto out; > > > > diff --git a/kernel/bpf/syscall.c b/kernel/bpf/syscall.c > > index c420edbfb7c8..a7af8d0185d0 100644 > > --- a/kernel/bpf/syscall.c > > +++ b/kernel/bpf/syscall.c > > @@ -569,7 +569,24 @@ static void bpf_map_release_memcg(struct bpf_map *= map) > > } > > #endif > > > > -int bpf_map_alloc_pages(const struct bpf_map *map, gfp_t gfp, int nid, > > +static bool can_alloc_pages(void) > > +{ > > + return preempt_count() =3D=3D 0 && !irqs_disabled() && > > + !IS_ENABLED(CONFIG_PREEMPT_RT); > > +} > > + > > I see this is new since v6 and wasn't yet discussed (or I missed it?) It was in v1: https://lore.kernel.org/bpf/20241116014854.55141-1-alexei.starovoitov@gmail= .com/ See Peter's comments. In this version I open coded preemptible(), since it's more accurate and disabled the detection on PREEMPT_RT. > I wonder how reliable these preempt/irq_disabled checks are for correctne= ss > purposes, e.g. we don't have CONFIG_PREEMPT_COUNT enabled always? I believe the above doesn't produce false positives. It's not exhaustive and might change as we learn more and tune it. Hence I moved it to be bpf specific to iterate quickly instead of being in inux/gfp.h and also considering Sebastian's comment that normal kernel code should better know the calling context. > As longs > as the callers of bpf_map_alloc_pages() know the context and pass gfp > accordingly, can't we use i.e. gfpflags_allow_blocking() to determine if > try_alloc_pages() should be used or not? bpf infra has a very coarse knowledge of the context. There are two categories: sleepable or not. In sleepable GFP_KERNEL is allowed, but it's very narrow and represents a tiny slice of use cases compared to non-sleepable. The try_alloc_pages() is for the latter. netconsole has a similar problem/challenge. It doesn't know the context where it will be called. Currently it's just doing GFP_ATOMIC and praying. This is something to fix eventually when slab is taught about gfpflags_allow_blocking.