From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 45D22E7717D for ; Thu, 12 Dec 2024 02:15:11 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id A9E926B0095; Wed, 11 Dec 2024 21:15:10 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id A7E386B0098; Wed, 11 Dec 2024 21:15:10 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 93CE56B0099; Wed, 11 Dec 2024 21:15:10 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0016.hostedemail.com [216.40.44.16]) by kanga.kvack.org (Postfix) with ESMTP id 759CD6B0095 for ; Wed, 11 Dec 2024 21:15:10 -0500 (EST) Received: from smtpin18.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay09.hostedemail.com (Postfix) with ESMTP id 237B981597 for ; Thu, 12 Dec 2024 02:15:10 +0000 (UTC) X-FDA: 82884689034.18.B584246 Received: from mail-wr1-f50.google.com (mail-wr1-f50.google.com [209.85.221.50]) by imf26.hostedemail.com (Postfix) with ESMTP id 26CB4140016 for ; Thu, 12 Dec 2024 02:14:49 +0000 (UTC) Authentication-Results: imf26.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b=ha3C7c2i; dmarc=pass (policy=none) header.from=gmail.com; spf=pass (imf26.hostedemail.com: domain of alexei.starovoitov@gmail.com designates 209.85.221.50 as permitted sender) smtp.mailfrom=alexei.starovoitov@gmail.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1733969697; a=rsa-sha256; cv=none; b=AVk3g8MPnV3Uexr+gKoiCQH5EZeWcvWn/7Jl94GOGyN5Y6KJ8fuQNszsaooCm6ANkjoP2G mhnBghRaDX7I/MTqR8SAl0uMg42dq/OeC+8zvibo/h8I+h2vt8yE7BkYKbIdNvSGxYwwPP 5ltjCMwo++OZCSlnz1XaeexaAjQBijc= ARC-Authentication-Results: i=1; imf26.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b=ha3C7c2i; dmarc=pass (policy=none) header.from=gmail.com; spf=pass (imf26.hostedemail.com: domain of alexei.starovoitov@gmail.com designates 209.85.221.50 as permitted sender) smtp.mailfrom=alexei.starovoitov@gmail.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1733969697; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=73Wvnxdsn5pQr8crOlrIk09pP4brXsnvzdO5qYCU+Zg=; b=jM8SBtSl+mjq9pI+M6ka411OQroOyqJbz4nCgotgTyKPPJ+mCEnaGpVdxvQYHSLcZmyfh6 hH4V+PGVkFAqz4tOL6cpZfWgFcU1+aU2G0n9LT6nlboLOR5Fan5XMRJtTSeAgf2aCI7Anh GYGJgkbbJmCsmbW8XAmabN6A3OHt3e8= Received: by mail-wr1-f50.google.com with SMTP id ffacd0b85a97d-3862a921123so28645f8f.3 for ; Wed, 11 Dec 2024 18:15:07 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1733969707; x=1734574507; darn=kvack.org; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=73Wvnxdsn5pQr8crOlrIk09pP4brXsnvzdO5qYCU+Zg=; b=ha3C7c2i8N9/CCcN9U3gVxp8O76s46zBE4bUrPlYeVWtUO3LEOJxmGnOBnLW4ACVx9 qfvxGLWWgOxgWPjRp7yf0WHkmaOeI3avxkgAEbSr4OteTPU9UQ0gO1z1wXIOs6Srx3ZT mojLxsvgrhXqOMKgN5ERK6OrrXKKhqgpTAST2r/eWMStMXCjp1NqkLiWw2I8Yr6vnR2N KuLiep5FbY9KlrdvwlaWCmko+LUeaEryHwEiFK24PmHenRGvQchB+evEeuWI61bL0VRu n4+asJTQGm27y+KC3Me4kPWX4uMzFX5xOvye5Vol4e2NnUc5tD/a1/BBXmWgkusqphjv +caA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1733969707; x=1734574507; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=73Wvnxdsn5pQr8crOlrIk09pP4brXsnvzdO5qYCU+Zg=; b=Swpzafoj6+LM9mYYuQ9kvMjiEiiaWdb2DQJ8DNe1Y0zLoox2NJY3Lz6xh/remLI0Mb UhAcIulQlF9TnSxXYSo3MzifCBPZhSqHZP/dbK3wfsN6HFYNbYJB2WMaRV0r2eK9EF31 UUDwUakBwZ1+5DYTNDCf01IQlNTYrhy+AvUILRmp2ntCh8gtgJgS2Ib3hUm7SkJ5f0n2 NeJXkZLkTzfc0wwdZmsy0+W/FUGcdGjJJPc7hdE8NJ2rOQIfTC3ZDJ7sdIn8ykqqxT+U nLpZl90fVMTEw3u6exRX2VmLVDaKEqSFxlBhQYz4gWU42pqh3boVL00sAB0T+6/yzGFq 545w== X-Forwarded-Encrypted: i=1; AJvYcCUBpOdJ+AtPcK5VVAwTa52qNIlm8FmdkFplCwfl7Bg0dZ+VSaPX8Uts3UmzidIVmiAfhqMxSWHB0Q==@kvack.org X-Gm-Message-State: AOJu0YwkiiVgvhcReXKCqygUFMkvVydYq0fnwVeirCsuPnKzDNge0r4X a6T9eplJl5Bcjp1GZN5FkKtxb7QJFIe0bVaA9ZcoKpP88v6HkdgfyFXPESXc4XFxAG0OIXY5Hn2 17+dp46a4UAxZLBuRK4Kq61aEcJY= X-Gm-Gg: ASbGncufGGSdklnbkI6wP43Bj+C0kmKVlL8+I6f0pFAm2GrbxzCHVid8JFkUPi7m7Xv HUq9kOvRi/nKibJKDLWtpIDlZAgWol3eSx8bYrUKPJUGcNrlTsiSpBA== X-Google-Smtp-Source: AGHT+IE4aLJpIwRr6IGFK1SO8me5JgCXb9equHXSSf1lbwrM5rhP3/h2pQTGziNVpb8TIkioPPAOChRZ6VR4GQTEn/U= X-Received: by 2002:a05:6000:178e:b0:382:3754:38fa with SMTP id ffacd0b85a97d-387877dc519mr1123597f8f.51.1733969706604; Wed, 11 Dec 2024 18:15:06 -0800 (PST) MIME-Version: 1.0 References: <20241210023936.46871-1-alexei.starovoitov@gmail.com> <20241210023936.46871-2-alexei.starovoitov@gmail.com> <20241210090136.DGfYLmeo@linutronix.de> In-Reply-To: From: Alexei Starovoitov Date: Wed, 11 Dec 2024 18:14:55 -0800 Message-ID: Subject: Re: [PATCH bpf-next v2 1/6] mm, bpf: Introduce __GFP_TRYLOCK for opportunistic page allocation To: Vlastimil Babka Cc: Sebastian Andrzej Siewior , bpf , Andrii Nakryiko , Kumar Kartikeya Dwivedi , Andrew Morton , Peter Zijlstra , Steven Rostedt , Hou Tao , Johannes Weiner , shakeel.butt@linux.dev, Michal Hocko , Matthew Wilcox , Thomas Gleixner , Tejun Heo , linux-mm , Kernel Team Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Rspam-User: X-Rspamd-Server: rspam03 X-Rspamd-Queue-Id: 26CB4140016 X-Stat-Signature: zrswmfyrrz9ozw1mze9a9yu3tckps7d5 X-HE-Tag: 1733969689-733053 X-HE-Meta: U2FsdGVkX1+8h7biZY1fcD1eOGpu2i6GMpBh4AklOWZkCnzv1nQWw7NeauW4rqFYqMIVZW6rofk5dKvRmX3XZdDklPSjwkfAWU5E/3p3+Qbvy8uatgzh2G/fCy0NJySlq0mcMB3WPUfqLJmumy8log+VfgNqcBJiXntsPnoI92euVi7qwS1ddSfRzUXxGC6CGTjno/KoZjJxcDvpFY6JfRS+fqXKDBnEIr3jHjoPW/219bGpCYw1zEamN/PfNzJY3HcLXN+lOESSQ54eqMnlUrwlVxa22IN9oyhFXElr5cUc2dpCrUkFLwjRskspuPjvuqTvLmkcuh72+WFB8A1p+Hl3m1LHBN+XC+5gIwNdoc1/1G8r0OKqjInET5mP05TQKs5QKvnrJr0JbY8gum449kxsA3uSHKnrq7aDC4/U9vibxSEgZmZB+ZGobBcEb1O35Q5D6DuyLVB1Xb4yUvo0lOHJ3gH0E1iV2kStFNXMN7qCVBQtAQonQXjqDf8QiPR/CzlnG3ItQRvuJLSFLXXz4iQMuxjYnZ3p8m7ZqbqMZluxfoGbotZN/+KURywnspChiV1pY/Khgwcry/23ESD+LimpaXNpYdvcVNzc8fj1qTE1FAUa5BmDD/zobJaAlLbeOnWhmHEm4B9ajJeeKe1gQP6QHf4N5yohTDnLtL0Q9bBOoBSYKx18wKtcVogOmD0YqiwKyrrYxZSus1ylLxY2RH3E0TA7/6yCOP5FPgd8EQYdCewep/eGlxB0GdT7UDgZ0HbtT9LD2xORlw2TPH13otVe2z3tmykURgjr6o/dK1qgA7oyrQs7qtlQOVhmStuGN/F09vaIGRmkze8FdrWbVMAeiG9R9AzkThDeuYheD5R0XSTec8cC8CS+DkdygpchyqkY+CY9XVRpX9WVA1kTyp3zIc6aYWvjSPf4wlArZxwW6FN4Cf35v8/MEqOFOFscVp+e+Kl8dbyL8BBr5e/ trQ+ScTZ 9jd2hbE5MEiHZvV+xFCx5/+7bwRSuBa5qt5YvPwa9b4PUAtJNjyZng51hpUY4LSazGHbZTo1iIg96ycFSPJMbgmfMyDWUOQVy/I7o8hEm7HOeo4aHGkakHANnicQtisZcZegcLyQyb1Zn6DZW4q+9B02/45vUUllaPnKjebgM7EoYVMvt4+FOcid9n14tfARVuyy/pfyT4IFyTOpsFUxa1zbgjIqfKzLO5TD9KSUTB/7lTKJwYQY6Q03DP0jXYPu1iuElMQeXcBHFLQuS3i3SFbf0wBI22/p3NqwO5HL9EOorh7EWs6iTU5dPEMTtVLuQONKKOfAc/4YZAA21bwepI4YG5BLL7GPkDuLYCoKW+f2S6vr+H+JWLBNsl3P9z4gesVAgG2IBpwUIBMeo5gT9Ze7cHFiKhAKY8xCdb0+xFrVvNZ8= X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On Wed, Dec 11, 2024 at 12:39=E2=80=AFAM Vlastimil Babka w= rote: > > On 12/10/24 22:53, Alexei Starovoitov wrote: > > On Tue, Dec 10, 2024 at 1:01=E2=80=AFAM Sebastian Andrzej Siewior > > wrote: > >> > >> On 2024-12-09 18:39:31 [-0800], Alexei Starovoitov wrote: > >> > From: Alexei Starovoitov > >> > > >> > Tracing BPF programs execute from tracepoints and kprobes where runn= ing > >> > context is unknown, but they need to request additional memory. > >> > The prior workarounds were using pre-allocated memory and BPF specif= ic > >> > freelists to satisfy such allocation requests. Instead, introduce > >> > __GFP_TRYLOCK flag that makes page allocator accessible from any con= text. > >> > It relies on percpu free list of pages that rmqueue_pcplist() should= be > >> > able to pop the page from. If it fails (due to IRQ re-entrancy or li= st > >> > being empty) then try_alloc_pages() attempts to spin_trylock zone->l= ock > >> > and refill percpu freelist as normal. > >> > BPF program may execute with IRQs disabled and zone->lock is sleepin= g in RT, > >> > so trylock is the only option. > >> > >> The __GFP_TRYLOCK flag looks reasonable given the challenges for BPF > >> where it is not known how much memory will be needed and what the > >> calling context is. > > > > Exactly. > > > >> I hope it does not spread across the kernel where > >> people do ATOMIC in preempt/ IRQ-off on PREEMPT_RT and then once they > >> learn that this does not work, add this flag to the mix to make it wor= k > >> without spending some time on reworking it. > > > > We can call it __GFP_BPF to discourage any other usage, > > but that seems like an odd "solution" to code review problem. > > Could we perhaps not expose the flag to public headers at all, and keep i= t > only as an internal detail of try_alloc_pages_noprof()? public headers? To pass additional bit via gfp flags into alloc_pages gfp_types.h has to be touched. If you mean moving try_alloc_pages() into mm/page_alloc.c and adding another argument to __alloc_pages_noprof then it's not pretty. It has 'gfp_t gfp' argument. It should to be used to pass the intent. We don't have to add GFP_TRYLOCK at all if we go with memalloc_nolock_save() approach. So I started looking at it, but immediately hit trouble with bits. There are 5 bits left in PF_ and 3 already used for mm needs. That doesn't look sustainable long term. How about we alias nolock concept with PF_MEMALLOC_PIN ? As far as I could trace PF_MEMALLOC_PIN clears GFP_MOVABLE and nothing else= . The same bit plus lack of __GFP_KSWAPD_RECLAIM in gfp flags would mean nolock mode in alloc_pages, while PF_MEMALLOC_PIN alone would mean nolock in free_pages and deeper inside memcg paths and such. thoughts? too hacky?