From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id BC587E7717D for ; Wed, 11 Dec 2024 10:19:18 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 44D8F6B027B; Wed, 11 Dec 2024 05:19:18 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 3FDB76B027C; Wed, 11 Dec 2024 05:19:18 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 29DFB6B027D; Wed, 11 Dec 2024 05:19:18 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0015.hostedemail.com [216.40.44.15]) by kanga.kvack.org (Postfix) with ESMTP id 0A91A6B027B for ; Wed, 11 Dec 2024 05:19:18 -0500 (EST) Received: from smtpin15.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay01.hostedemail.com (Postfix) with ESMTP id 84D8E1C76A8 for ; Wed, 11 Dec 2024 10:19:17 +0000 (UTC) X-FDA: 82882280208.15.2F76878 Received: from mail-wm1-f53.google.com (mail-wm1-f53.google.com [209.85.128.53]) by imf15.hostedemail.com (Postfix) with ESMTP id 7F02AA0007 for ; Wed, 11 Dec 2024 10:18:50 +0000 (UTC) Authentication-Results: imf15.hostedemail.com; dkim=pass header.d=suse.com header.s=google header.b=cmTW89E0; spf=pass (imf15.hostedemail.com: domain of mhocko@suse.com designates 209.85.128.53 as permitted sender) smtp.mailfrom=mhocko@suse.com; dmarc=pass (policy=quarantine) header.from=suse.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1733912331; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=650XyRGaHbLD677ELO1Xl1lrF0UQbw67E/gRyWNlMa0=; b=GffDjHbXwOc6DgDFAkkcxq+o8Zn885hDHtfEg5nNvnlJ/6/y9vLMOShSv4WKumUQH7Cooa MwA9JF4ItRNpnF/DvUHTYJoe74Da0H/5OSmKGWdDE+pujTAJXYX2JN2K7cyhL9+AON0EjZ Czz7d63eJnEI45K12Rr4j3Re5OkTvVk= ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1733912331; a=rsa-sha256; cv=none; b=H2CwSu5/zoD0UmlhZdOwx9ucgYQF1kMHI0ZLY8b55DslFxbtOvkOYJt31KOtOMw1cY/62u ohFJTUYQDzdNK5l8IMUuCengMV17VSqjaCZhcXZ7vbNyWyDttgcmr/vmX8Ga036k6SPiLR rNr+o4IYhR7ha90HztGeCCCU+/UaTic= ARC-Authentication-Results: i=1; imf15.hostedemail.com; dkim=pass header.d=suse.com header.s=google header.b=cmTW89E0; spf=pass (imf15.hostedemail.com: domain of mhocko@suse.com designates 209.85.128.53 as permitted sender) smtp.mailfrom=mhocko@suse.com; dmarc=pass (policy=quarantine) header.from=suse.com Received: by mail-wm1-f53.google.com with SMTP id 5b1f17b1804b1-432d86a3085so43267435e9.2 for ; Wed, 11 Dec 2024 02:19:15 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=suse.com; s=google; t=1733912354; x=1734517154; darn=kvack.org; h=in-reply-to:content-transfer-encoding:content-disposition :mime-version:references:message-id:subject:cc:to:from:date:from:to :cc:subject:date:message-id:reply-to; bh=650XyRGaHbLD677ELO1Xl1lrF0UQbw67E/gRyWNlMa0=; b=cmTW89E05hY7Pwv763hMlRWXhfBhv63DbhtmYhpx/UzW+Qgv58eEydSiEbmjJdH4/o 1C1qHstIteYfI74ePKym7lUFC7YUrIi3KcHnOV7dFsDZZ95iIXNOV3IVhPkVmr4E7JAK fiYUl+cuHNt39Xbogd7oGohp0CSwt6C0Ed716FPHEIhldSqnh9NHMKQUsI2+zfSXnPMs u2Y8/ELE+E95MpoXl5XsbSUfQV2RccTlxgT6V8v5NE9zPDcJlgQ3qgg7hDQ2oi/icNNF RkPmPXStC2dlb2UuXODp6dn3s3+uT+0bHZmjSCw1uwAmHfPclXcCf5vcVCyCgNzJ1fbw 632A== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1733912354; x=1734517154; h=in-reply-to:content-transfer-encoding:content-disposition :mime-version:references:message-id:subject:cc:to:from:date :x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=650XyRGaHbLD677ELO1Xl1lrF0UQbw67E/gRyWNlMa0=; b=h/0Zo8d6ksTwt5eIuK5zrddM0VKYSIZekq5pv1C07sIzPp+kCbtlN0FfqY3YP4dNHk Z7R+t+3ENNMRhFEul1eYAjGz4iAyK/0JwAPf3eSo10/WwjeR5oKyw5TOY8S9ln/D2rRs AbNK9HdMol+rUK+RWXQBofaUmK74tAA74AEwE4LRdKCY/hhDsV6rQcfUKWa7cblU5uXz dLeDoL++Qo58xMZDivIc948usPjFuyosfeXS41WnsRzva1fhtXnTB1a2ReDp8sOFxAll lUNeTrK+77yCoXamXb8UBG4gFR06uqKt9ZmnPVJL/NMrBhOYdqW43UXX5adR//hJtEG9 D7oQ== X-Forwarded-Encrypted: i=1; AJvYcCUoeIqhFj9tB/St6J6AOrjI+WjlXAtRvQ8AgP632vxSusLNp1/erxzFmHQu2U7Sew8jihmhRNrscA==@kvack.org X-Gm-Message-State: AOJu0YwciOV7cArvI5xFFV1fa/JhHgIHhtkdS0BQ5jaxFm1FkqxoWltV Ygb5fQ2vJyLJNfVrisPJPwzKu4UqdphNJOWqwxCKcCfBsmxhWmsL7Vdg4wdg9ZM= X-Gm-Gg: ASbGncuZy6/WpRnReN2qyrN2JQTVDnJMiZL7OKT9atBybDPzlTxQgpWA3t9j+6IZINM Ch4ckv6skPT2/o+u2n3uXymdXSggV/aPqpTIVOHWvNkxbH2rh4Dey4rR449Erx1sR3n6j0WAE04 /an9YEJ/buguVr6PQKPen8g50aVJ1wJgkdScR5NzRdoLBaf6KvW1VHGn2MLk7+QJBWFyvEc2apR 706tWprHEcrsgmi3mXbrx1C+u27omkgYiLJAhd92/jH+MrsEEZMSzGQM+ztLHKMc40= X-Google-Smtp-Source: AGHT+IHYg2Y8g4/aWQs8zDiuCFpi6DoGTJ4Z4OGmOvw05oFkj1Vpg4R3AQ8XJBYJM01DWA40uRtntQ== X-Received: by 2002:a05:600c:a011:b0:434:ff45:cbd0 with SMTP id 5b1f17b1804b1-4361c3766bfmr17715885e9.17.1733912353890; Wed, 11 Dec 2024 02:19:13 -0800 (PST) Received: from localhost (109-81-86-131.rct.o2.cz. [109.81.86.131]) by smtp.gmail.com with ESMTPSA id ffacd0b85a97d-3878248e57fsm970272f8f.8.2024.12.11.02.19.13 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 11 Dec 2024 02:19:13 -0800 (PST) Date: Wed, 11 Dec 2024 11:19:12 +0100 From: Michal Hocko To: Alexei Starovoitov Cc: Matthew Wilcox , bpf , Andrii Nakryiko , Kumar Kartikeya Dwivedi , Andrew Morton , Peter Zijlstra , Vlastimil Babka , Sebastian Sewior , Steven Rostedt , Hou Tao , Johannes Weiner , shakeel.butt@linux.dev, Thomas Gleixner , Tejun Heo , linux-mm , Kernel Team Subject: Re: [PATCH bpf-next v2 1/6] mm, bpf: Introduce __GFP_TRYLOCK for opportunistic page allocation Message-ID: References: <20241210023936.46871-1-alexei.starovoitov@gmail.com> <20241210023936.46871-2-alexei.starovoitov@gmail.com> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Disposition: inline Content-Transfer-Encoding: 8bit In-Reply-To: X-Rspamd-Queue-Id: 7F02AA0007 X-Rspam-User: X-Rspamd-Server: rspam07 X-Stat-Signature: 5ogurynxq8cnryj4kfbwi9u7oxb9tqp4 X-HE-Tag: 1733912330-328104 X-HE-Meta: U2FsdGVkX18JYi/uIbV0m8l5c45U1EyW8SKJfI90Z1oyDZFyTN/fY/Y22G2hG7GtI4aOjKDmcsimADE70r420OqQsjAwvyDgcshfzPPgTPCtXmW8q6sOc2XTzRtIL+SpQmAzwNsA01j7JAmpCdXk6FUI2OeuNukT/axJdEni7PpKw3waGPWZ055z96rIxaglpHPtpJKrkc8AHEMr4nA1GKH21MW37chZ15z4FuWp/pqlDR+aSw7AyyMb/UXoV7pWm9LU8eUxgEDj3vRroROlKcYSVcjXCo+jYEqIiBojao7fZgtDToJWpnN6LXok5g0AwKc1CSKC6LM200vQD5V1Z/uCU37ICJ+1agpQbpcQV8Q/4vI32lEhV8iQrnjnzblsa5rZ2XT0rQGqf+O/IRi6h4f8VwoOhOKu5res31G2I9FTFVYnKs958Hvrc3kBZycTxsE4wyhNKxr3qyCyChpxf49lmIwfd7Q1io6oePbf4+BccfIU9lWMKdXGqnuP+Bq0r1Udc1Ry8O5f4CJmwn3X4C1Z/yezMpr8V+xLWtYoWKSYIrUKpVPN3BlLERxMHbj4oQxOxK0wIUQ5DG14TbaCQtDYmocbg4rTWfZuDrxoV5a56q+bInz76NFZMnsFmSmt3Wc+ezMg8fPFFVzHiqDvC6/mCIE9SNKuKzpqGnO44B8qO0ruOXb/ilQ2YEJfJkd0VabQnuvMiDxap3Z2na0bb3Sr95trnaXEGQKi9J8igcYClxvokfjYihNpPK5hLiTpt+1qr5xEhdqEVoxT4KVsXAWKBs806ge4AzDeuCrQ9nH1u08rELHOVSVsMPd+0/uCxDIaF+G2S5xKuNOEYby0oEBMk7xjy7/UfCGehjEr18OEZJdxYKdKfhSi5RD9ZSTNvB1nGoh/TAiyYSqT8x6YP7AwErvC8qFH4wsiUax7mvBwkeiW0i0zZdIFQTlTwEiZpOTFHQ9AGjfBkJgDkOs 43Hq1zt6 Wany0kzu1MIFw1i6pVtd2P2PURAA4qlPPhyVmvI2YA4zlHhzaNxjS+nUyySYax1WEZzEu7HjiNa/T2Owe5z1BD3iKlMut3ZEshZHNUnwiei6hkoEJFGMse0vR0e1GL7oQRiaDL99zlkEEVpOUKGMLHADsqp9N1dCrNgXhnt3Kf29NkSwGLYute75mYkERwSD85Vt1IkbwkgiX/CYkxqltLy5PrDkB4aUXcv6eebKQbGk0M8GBi7xi0UAQtsE2RyzYDhIZR+zeHTmt7t+J7O1A54Fk5s+2MQWB/ZI710tyqY14yOAN76mzCWNyCKdg92fISmiC1uAI9r4wnjGerR8p6CiTW6XM1VhifmSsR+pqSY9hnC3ZjBLku+4RQI9wq+BLBVYmVnLlWg77njsO7dqQVY7scre9S5F/Pz6P/rAbpe3eOx6V/uoLtC6J+iuvLSnuxY5Tu/0l3+e51oIvLtXG2ikjI6NwMKwRismxxpkfp6ktDN6mvgYDUJ6cdPF5kDJLcPPGZvOyDwIe19U= X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On Tue 10-12-24 14:06:32, Alexei Starovoitov wrote: > On Tue, Dec 10, 2024 at 1:05 AM Michal Hocko wrote: > > > > On Tue 10-12-24 05:31:30, Matthew Wilcox wrote: > > > On Mon, Dec 09, 2024 at 06:39:31PM -0800, Alexei Starovoitov wrote: > > > > + if (preemptible() && !rcu_preempt_depth()) > > > > + return alloc_pages_node_noprof(nid, > > > > + GFP_NOWAIT | __GFP_ZERO, > > > > + order); > > > > + return alloc_pages_node_noprof(nid, > > > > + __GFP_TRYLOCK | __GFP_NOWARN | __GFP_ZERO, > > > > + order); > > > > > > [...] > > > > > > > @@ -4009,7 +4018,7 @@ gfp_to_alloc_flags(gfp_t gfp_mask, unsigned int order) > > > > * set both ALLOC_NON_BLOCK and ALLOC_MIN_RESERVE(__GFP_HIGH). > > > > */ > > > > alloc_flags |= (__force int) > > > > - (gfp_mask & (__GFP_HIGH | __GFP_KSWAPD_RECLAIM)); > > > > + (gfp_mask & (__GFP_HIGH | __GFP_KSWAPD_RECLAIM | __GFP_TRYLOCK)); > > > > > > It's not quite clear to me that we need __GFP_TRYLOCK to implement this. > > > I was originally wondering if this wasn't a memalloc_nolock_save() / > > > memalloc_nolock_restore() situation (akin to memalloc_nofs_save/restore), > > > but I wonder if we can simply do: > > > > > > if (!preemptible() || rcu_preempt_depth()) > > > alloc_flags |= ALLOC_TRYLOCK; > > > > preemptible is unusable without CONFIG_PREEMPT_COUNT but I do agree that > > __GFP_TRYLOCK is not really a preferred way to go forward. For 3 > > reasons. > > > > First I do not really like the name as it tells what it does rather than > > how it should be used. This is a general pattern of many gfp flags > > unfotrunatelly and historically it has turned out error prone. If a gfp > > flag is really needed then something like __GFP_ANY_CONTEXT should be > > used. If the current implementation requires to use try_lock for > > zone->lock or other changes is not an implementation detail but the user > > should have a clear understanding that allocation is allowed from any > > context (NMI, IRQ or otherwise atomic contexts). > > __GFP_ANY_CONTEXT would make sense if we wanted to make it available > for all kernel users. In this case I agree with Sebastian. > This is bpf specific feature, since it doesn't know the context. > All other kernel users should pick GFP_KERNEL or ATOMIC or NOWAIT. > Exposing GFP_ANY_CONTEXT to all may lead to sloppy code in drivers > and elsewhere. I do not think we want a single user special allocation mode. Not only there is no way to enforce this to remain BPF special feature, it is also not really a good idea to have a single user feature in the allocator. > > Is there any reason why GFP_ATOMIC cannot be extended to support new > > contexts? This allocation mode is already documented to be usable from > > atomic contexts except from NMI and raw_spinlocks. But is it feasible to > > extend the current implementation to use only trylock on zone->lock if > > called from in_nmi() to reduce unexpected failures on contention for > > existing users? > > No. in_nmi() doesn't help. It's the lack of reentrance of slab and page > allocator that is an issue. > The page alloctor might grab zone lock. In !RT it will disable irqs. > In RT will stay sleepable. Both paths will be calling other > kernel code including tracepoints, potential kprobes, etc > and bpf prog may be attached somewhere. > If it calls alloc_page() it may deadlock on zone->lock. > pcpu lock is thankfully trylock already. > So !irqs_disabled() part of preemptible() guarantees that > zone->lock won't deadlock in !RT. > And rcu_preempt_depth() case just steers bpf into try lock only path in RT. > Since there is no way to tell whether it's safe to call > sleepable spin_lock(&zone->lock). OK I see! > > We > > already have a precence in form of __alloc_pages_bulk which is a special > > case allocator mode living outside of the page allocator path. It seems > > that it covers most of your requirements except the fallback to the > > regular allocation path AFAICS. Is this something you could piggy back > > on? > > __alloc_pages_bulk() has all the same issues. It takes locks. > Also it doesn't support GFP_ACCOUNT which is a show stopper. > All bpf allocations are going through memcg. OK, this requirement was not clear until I've reached later patches in the series (now). -- Michal Hocko SUSE Labs