From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 8889FE7717F for ; Tue, 10 Dec 2024 20:25:22 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 163056B014D; Tue, 10 Dec 2024 15:25:22 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 0EBEF6B0150; Tue, 10 Dec 2024 15:25:22 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id EA77C6B0152; Tue, 10 Dec 2024 15:25:21 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0015.hostedemail.com [216.40.44.15]) by kanga.kvack.org (Postfix) with ESMTP id C8D036B014D for ; Tue, 10 Dec 2024 15:25:21 -0500 (EST) Received: from smtpin21.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay09.hostedemail.com (Postfix) with ESMTP id 6F8C980A6A for ; Tue, 10 Dec 2024 20:25:21 +0000 (UTC) X-FDA: 82880178528.21.33AEF7E Received: from out-177.mta1.migadu.com (out-177.mta1.migadu.com [95.215.58.177]) by imf27.hostedemail.com (Postfix) with ESMTP id E7EAC4000D for ; Tue, 10 Dec 2024 20:24:53 +0000 (UTC) Authentication-Results: imf27.hostedemail.com; dkim=pass header.d=linux.dev header.s=key1 header.b=md3ET4Dx; spf=pass (imf27.hostedemail.com: domain of shakeel.butt@linux.dev designates 95.215.58.177 as permitted sender) smtp.mailfrom=shakeel.butt@linux.dev; dmarc=pass (policy=none) header.from=linux.dev ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1733862304; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=7UEk9YCLql77ctfkbt68qK2vcpFDVgn1aspbQQz0FRE=; b=uIU+rPNuMl/kdjJfgv6TjKBbkk1EASn0hDnMTOb27huMJBb0nTpQmwYEBRbNQ0SJMVxag7 Pn85RDoNC6ugicevwMCa7BUHBRHE5cuJTTvVEsVs9KH4QIti9LJEd0PheBPNcxUUA29GQR xGU3GfI+2cMzf7tTYqSmBlI1SBmCp2I= ARC-Authentication-Results: i=1; imf27.hostedemail.com; dkim=pass header.d=linux.dev header.s=key1 header.b=md3ET4Dx; spf=pass (imf27.hostedemail.com: domain of shakeel.butt@linux.dev designates 95.215.58.177 as permitted sender) smtp.mailfrom=shakeel.butt@linux.dev; dmarc=pass (policy=none) header.from=linux.dev ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1733862304; a=rsa-sha256; cv=none; b=QiG0s3umf5xCuFagJW7mWBTKD1aQgfRDx+E/wrnXrWLxAl5W0J3XaQNbJRJjUTJ9t0prc4 jKRSDkeC4qo3VLQFb1rUrwVAXZQoiIHSMrvayl4nv02V/YTq9tGT79kCAopG9/m84pBAS7 aq3a+G2Qh7Mevd6VdIgWo6Ee8NfYag8= Date: Tue, 10 Dec 2024 12:25:04 -0800 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linux.dev; s=key1; t=1733862316; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: in-reply-to:in-reply-to:references:references; bh=7UEk9YCLql77ctfkbt68qK2vcpFDVgn1aspbQQz0FRE=; b=md3ET4DxfmJDU3hnmvOZoa5rfVLNCUshC7NtrNcXmuq233UkoO8gfSx3qYhylaQcUJhjso wykD7Czc1leWXx6uwKmgw0NkOL+boVz6SFNwzeRO+zYeIILsTP65vh1X8B2C9fNdg3X0Nd uLckfoqB3qfrt1ie6OuWmIN2h/Aje3M= X-Report-Abuse: Please report any abuse attempt to abuse@migadu.com and include these headers. From: Shakeel Butt To: Michal Hocko Cc: Alexei Starovoitov , Matthew Wilcox , bpf@vger.kernel.org, andrii@kernel.org, memxor@gmail.com, akpm@linux-foundation.org, peterz@infradead.org, vbabka@suse.cz, bigeasy@linutronix.de, rostedt@goodmis.org, houtao1@huawei.com, hannes@cmpxchg.org, tglx@linutronix.de, tj@kernel.org, linux-mm@kvack.org, kernel-team@fb.com Subject: Re: [PATCH bpf-next v2 1/6] mm, bpf: Introduce __GFP_TRYLOCK for opportunistic page allocation Message-ID: References: <20241210023936.46871-1-alexei.starovoitov@gmail.com> <20241210023936.46871-2-alexei.starovoitov@gmail.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: X-Migadu-Flow: FLOW_OUT X-Rspamd-Server: rspam05 X-Stat-Signature: 5u88xt616azcbtperes79o7jmigjiq13 X-Rspamd-Queue-Id: E7EAC4000D X-Rspam-User: X-HE-Tag: 1733862293-239090 X-HE-Meta: U2FsdGVkX1/41DADGVSDrMNvwiFIa4u6ZgzpDEeWYNdArDs3gKG0JphiQCmIvR7J+ByHz3tSQwWobegfVsG1UQ5qi59Y2DsfIh5/CJAjGj961fygjT74APxkFZOqtYF+qlAzC8mauDCPuIf85uFDXv6mBy6JTjCKiK4r+gvRjXjQAd3+JaYY4ct0QfdNTE3mrC3nlPbsf8Wg0aW2XZFwkr9yiUrmGglMWR6u7UL7p6QjsiI/0/vjqh5Y7yVyGnKFFZtOHmA0qp+GmFn+1d07yf9CNi5sYN8fA89bo2ZlWmh1eyeE+u92Ol4uMNm41686+9eZTsTTcEu/Y2rRnedOyvNNpDg0JfCLZ8zmt1glvTVORGisuwpzqlCibIVgqtsivN+IejJg3OYMCzU8RfV7J9CutID8tVd1y9YkKbhonKK9kb0FaU53N8H909knXYxrG5hJI3b+XAVeQFMVS3T6vU+yBz3R/wvAb4quymXrcUAkLWc33OsusUXjH7eugVrk95ivisrpffBukdcg5mZ7H+FEMj02Dwr44FM+iYFWnNJer4nCncLy75Q1ldsH3PRVeCuGWVO+F1BpmxwNx1oD2SHPGEIdz7rf0omqX10XqNIHy/CusdlKiOStS+yYggNKa2yQ8eSuFnQ1wB8znce/OwnmmUbH704ad2UFuMKWBKrOdxrEMH0tQel0PoBnFVb+lggQQ99OSmTHTw7La00CFX3c+myXIkpxDS/07SwPzfa6PxTZmuz/wUuWEzTf5ZvO6jbTIalWrb6rkjh3S14tBlueuJ7Tj4zpIyud3D7dqTMwk8V73PKLnjlRZyihiXK6w5E6OM7gImzB5/+mFx86b/K21IRwNNlqyEr54vq+uYhyByb/w8gX84rgo5IFOB0qSlGNotX9qC8HoWajFTBHymoq28ihP7Zirac2nBmc0LytjNu51qwrpXW/CNedr0q6CDuzPRFkixBVs/6dbtx /1ZwnMfS nZWiNPW1zRkwI2jPNQtWZgvwacTdro4/iWfQKNaz7HrlsaW5pVi3qf6FEthdvj3Ohm6ZWQRmagg1Rv2+Re0MM9EgDkFVpN39nL98394cVS16BxA504kPUFHMZZ3ok/BPcab/XKvwAUYNKbUAjiyMQj4pm1wuILKvWr7Xb0qb5qnjr3RS81WlncP5OFtaY002eSzYlctatldygSp5qApYZAlESvpmLOgonfCjwo9JRlgXDy2QTzZE5FFtyVQTyzFxrCCg+1SWv+g/dxcQ= X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On Tue, Dec 10, 2024 at 10:05:22AM +0100, Michal Hocko wrote: > On Tue 10-12-24 05:31:30, Matthew Wilcox wrote: > > On Mon, Dec 09, 2024 at 06:39:31PM -0800, Alexei Starovoitov wrote: > > > + if (preemptible() && !rcu_preempt_depth()) > > > + return alloc_pages_node_noprof(nid, > > > + GFP_NOWAIT | __GFP_ZERO, > > > + order); > > > + return alloc_pages_node_noprof(nid, > > > + __GFP_TRYLOCK | __GFP_NOWARN | __GFP_ZERO, > > > + order); > > > > [...] > > > > > @@ -4009,7 +4018,7 @@ gfp_to_alloc_flags(gfp_t gfp_mask, unsigned int order) > > > * set both ALLOC_NON_BLOCK and ALLOC_MIN_RESERVE(__GFP_HIGH). > > > */ > > > alloc_flags |= (__force int) > > > - (gfp_mask & (__GFP_HIGH | __GFP_KSWAPD_RECLAIM)); > > > + (gfp_mask & (__GFP_HIGH | __GFP_KSWAPD_RECLAIM | __GFP_TRYLOCK)); > > > > It's not quite clear to me that we need __GFP_TRYLOCK to implement this. > > I was originally wondering if this wasn't a memalloc_nolock_save() / > > memalloc_nolock_restore() situation (akin to memalloc_nofs_save/restore), > > but I wonder if we can simply do: > > > > if (!preemptible() || rcu_preempt_depth()) > > alloc_flags |= ALLOC_TRYLOCK; > > preemptible is unusable without CONFIG_PREEMPT_COUNT but I do agree that > __GFP_TRYLOCK is not really a preferred way to go forward. For 3 > reasons. > > First I do not really like the name as it tells what it does rather than > how it should be used. This is a general pattern of many gfp flags > unfotrunatelly and historically it has turned out error prone. If a gfp > flag is really needed then something like __GFP_ANY_CONTEXT should be > used. If the current implementation requires to use try_lock for > zone->lock or other changes is not an implementation detail but the user > should have a clear understanding that allocation is allowed from any > context (NMI, IRQ or otherwise atomic contexts). > > Is there any reason why GFP_ATOMIC cannot be extended to support new GFP_ATOMIC has access to memory reserves. I see GFP_NOWAIT a better fit and if someone wants access to the reserve they can use __GFP_HIGH with GFP_NOWAIT. > contexts? This allocation mode is already documented to be usable from > atomic contexts except from NMI and raw_spinlocks. But is it feasible to > extend the current implementation to use only trylock on zone->lock if > called from in_nmi() to reduce unexpected failures on contention for > existing users? I think this is the question we (MM folks) need to answer, not the users. > > Third, do we even want such a strong guarantee in the generic page > allocator path and make it even more complex and harder to maintain? I think the alternative would be higher maintenance cost i.e. everyone creating their own layer/solution/caching over page allocator which I think we agree we want to avoid (Vlastimil's LSFMM talk). > We > already have a precence in form of __alloc_pages_bulk which is a special > case allocator mode living outside of the page allocator path. It seems > that it covers most of your requirements except the fallback to the > regular allocation path AFAICS. Is this something you could piggy back > on? BPF already have bpf_mem_alloc() and IIUC this series is an effort to unify and have a single solution.