From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 84D21E77180 for ; Tue, 10 Dec 2024 22:06:48 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 1B43E8D0022; Tue, 10 Dec 2024 17:06:48 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 1651C8D0017; Tue, 10 Dec 2024 17:06:48 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 006498D0022; Tue, 10 Dec 2024 17:06:47 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0017.hostedemail.com [216.40.44.17]) by kanga.kvack.org (Postfix) with ESMTP id D82738D0017 for ; Tue, 10 Dec 2024 17:06:47 -0500 (EST) Received: from smtpin10.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay03.hostedemail.com (Postfix) with ESMTP id 9A43EA0FB5 for ; Tue, 10 Dec 2024 22:06:47 +0000 (UTC) X-FDA: 82880434182.10.6B1E11E Received: from mail-wm1-f43.google.com (mail-wm1-f43.google.com [209.85.128.43]) by imf01.hostedemail.com (Postfix) with ESMTP id BABC940012 for ; Tue, 10 Dec 2024 22:06:28 +0000 (UTC) Authentication-Results: imf01.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b=Sh2dZJYy; spf=pass (imf01.hostedemail.com: domain of alexei.starovoitov@gmail.com designates 209.85.128.43 as permitted sender) smtp.mailfrom=alexei.starovoitov@gmail.com; dmarc=pass (policy=none) header.from=gmail.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1733868391; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=J7r/ujUSA8WI6llNNMSkcXsP+FVbV3O9MXAN4gtJ0t4=; b=f9Xq4TC1nI46Y649E8afvRuv9NgozXFpBHvGI5UPBZA4mlK7mIChSBrDbrAg0wZelpwfpR LJfE/241LsqJXdlQnU/TlQiqJzs7PPVBXWE2pRcYkxUlM8i7rseIgAajt5ZRaZy5x8Ocop vkr2cM/enkOyE2CXHinl4esPZuM5Lvo= ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1733868391; a=rsa-sha256; cv=none; b=ZDnPXLk/KOZtcBdgrH3DvJCkl5ChjWDeY7oLis9Y+eKINwGq1jQlL4GgFhU5WKpDKoUK6S bte2xswCA3GsnOUczQe44I6MBCtmIYiwYPqpSj68yRWffTzcC51MxCwvN8RlSnqzlylHnL PV28zJefd3O35Lm47FCd6+Fvp89O3OY= ARC-Authentication-Results: i=1; imf01.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b=Sh2dZJYy; spf=pass (imf01.hostedemail.com: domain of alexei.starovoitov@gmail.com designates 209.85.128.43 as permitted sender) smtp.mailfrom=alexei.starovoitov@gmail.com; dmarc=pass (policy=none) header.from=gmail.com Received: by mail-wm1-f43.google.com with SMTP id 5b1f17b1804b1-434a742481aso53761355e9.3 for ; Tue, 10 Dec 2024 14:06:45 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1733868404; x=1734473204; darn=kvack.org; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=J7r/ujUSA8WI6llNNMSkcXsP+FVbV3O9MXAN4gtJ0t4=; b=Sh2dZJYyUW5GrkPV8gh9gPwF5yYkqYnxsFQfh8Jt/mS3p1n1xgChgC0I9Ty4nlfPdo OcWwEK+zGmWki87W+3YRGq1fty1n9zlNbX5Pbg6LGRqDZ5SGtMWVBLu3fwqgbFHQ8TYg 4gYtyz6OiFk9Zp9PbggNAojZY3iEUfipYlgRnqMN/ct9PDPrMDwOKoEx7lDR+gRGC0hb GEuDJ2OcLXpTwgogaJ0OXxrSbi3quaczfmThYwcaf+6tkMIjuPUtuT7ADUKeyQaVnCWN aR1bEGvNdQZZOxlZsPd6IuT1BBDRjhQnLlCBddZ6KXZ8jydreukNHv/xSf2aH33H2erm 6RkA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1733868404; x=1734473204; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=J7r/ujUSA8WI6llNNMSkcXsP+FVbV3O9MXAN4gtJ0t4=; b=sYN7apOFPnp1um5rKbyC8l2P8sREiSaUS5uF0l5NfbsdQUAzXH9wYvjyqGYc/nuAr5 Q+hSgTp3qPSaT2C7+WbceQgyPrBwrmHYBv6Yih7iwE7o6CG5lw+CFDzhKC13xjT0Xds7 FaXo2uyk9b8nKm+sxsk4/mC1j+3qNtn7lxw3MbYlldsQhMzD+uBli1bD+86h6qXOyyzs KhvS1cBEmHQpFdftgTxVjRsrKoD2XNtOm8C3HBux+DY65+woueJWr+eE6u7ymzz7aKTs nOICAWVKKrxGmr8XADl5vpxwCOj2KqW20lwIaBvAklwBduwkHUWE9yM7rDFTj6ZR4Uc3 7lzQ== X-Forwarded-Encrypted: i=1; AJvYcCUYjPP/XCGz4DDiYLuGn9xDS2C8IqHqvJppcyYB+m5iD3YJqz4Ye+cXTvOUTpKO/0J/W0nOQB8nDw==@kvack.org X-Gm-Message-State: AOJu0Yyob3FTP2tec6oaeNc5myArDGujDv9kLl71lyhdU1CoRZ2ouhMx YA2LbY+oua17u/UEQvb9KY0x9orLNTS0IMbLmcdKFXun3qaqmxBeSu2QuKsOYknvmO1djpdK/V4 F9O5Erx+FbrTBNcYlhEqDRIeIf8s= X-Gm-Gg: ASbGncvMiPSQpxHjnXAOAsMzm+jh0lsNNggg/Z289YZ4oRk7Hohc5lL0mAyrXHqHua1 Mp5FQohMp4bV+z7kSVy6/M0UbITsrSLjKZ8kdDbTJVYOjbEpjiAs= X-Google-Smtp-Source: AGHT+IGDmmtb22UuyrBBvK3V9jG5ErcHX3MG2LurkjPVBWfp7ZCC778kP32UIzbrh9Bl9mOTwwrvDvEwrjFZRM3ZrBU= X-Received: by 2002:a5d:5849:0:b0:385:f6c7:90c6 with SMTP id ffacd0b85a97d-3864ce55a98mr519902f8f.20.1733868404233; Tue, 10 Dec 2024 14:06:44 -0800 (PST) MIME-Version: 1.0 References: <20241210023936.46871-1-alexei.starovoitov@gmail.com> <20241210023936.46871-2-alexei.starovoitov@gmail.com> In-Reply-To: From: Alexei Starovoitov Date: Tue, 10 Dec 2024 14:06:32 -0800 Message-ID: Subject: Re: [PATCH bpf-next v2 1/6] mm, bpf: Introduce __GFP_TRYLOCK for opportunistic page allocation To: Michal Hocko Cc: Matthew Wilcox , bpf , Andrii Nakryiko , Kumar Kartikeya Dwivedi , Andrew Morton , Peter Zijlstra , Vlastimil Babka , Sebastian Sewior , Steven Rostedt , Hou Tao , Johannes Weiner , shakeel.butt@linux.dev, Thomas Gleixner , Tejun Heo , linux-mm , Kernel Team Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Rspamd-Server: rspam02 X-Rspamd-Queue-Id: BABC940012 X-Stat-Signature: h79d3f3iks4krr1r6rgymedritm5r1m8 X-Rspam-User: X-HE-Tag: 1733868388-676051 X-HE-Meta: U2FsdGVkX1/afqt6YSsH9MdQ/HZZT2vR94UBqdJANYgjV20oq5JKaVVN4HtvQEZZFWhiVbwZFkNBtRmC7MoHTOwX4hXT7a60HM9KgVQ0nqxTRHKtG75v6nvB1EjPY9hnilo9OtuXbBVJHZ+0ovNKYFPofHHwC+2TzU3OeccIweddMjGo5wNNOzsws4Ggv1MQ0kaT9EwhcKzBfwm1aLOr0aYQF5dEhuUetBGecf1PZokWi8X/4aWfFBYPE869QC3yreHGLFGglw4ZhwSYtb246/uomzMxAaxIogDxjN79DpiCt5MXaHsAciXXzKsHoxifMlLSGhrZJTp7tY/AON1nc8rPHfubmiU3Tfrz9ZVUUa1FWa1mF796T9CblYY4460sKB6sVfQmjWLdq2xiiV9JhWB2ocibgJO5YMjIRDVJEIsW4adAX/pI6IINoeBFCsRppZ65HOIj+t2E1vH6UiYy7Hn6TB6gqWo1uSIBSYVyAtanz7es1deheOVPsFRPFqeEguN6dAsJaL6L4ZDWji6iV/l89PJ2326wBPReMY3pAzbqebQqSO5oY85mmZJtNtY4tFeN8FZKWdeRuJ1EsPXYUolyuSW1UdrpenlZ+43B+ereYtM/+5mBDsrfu6tGnMZa7/ImNtU+8nUV2a0QZNxwJcUj0OdfQn4LKD37MSBvefxhS/s1PNxYjf7iKkL5i5CMA7zO64luTW5T4ut3yc6VxMqOjOITbcnWNuXDMfy9M2/6zuOzDT/7dRJZYGZP7uBtXoLui1FMPnuFu65slqopGbH07PtCkwroVyv4z3whKLBRVNWwHH5MPXLlCWdeOFx7lkyp9Xv1iuBWKfaN8GdEEgV2QFMygTR+aGJZJhEO7UnfOA0qwO92Ls0fZbW428CSyzKIaVGIUEPg9JWTk3ojEqIlh3TbnpnS6Jkqp4i/0uiV2Cms5TmMjIghAy6gOK7F3hjPbjj7GgbNkE147Jy DsUFwxtl EPZ/VHuoN806oTCmIhJRMBrmz9A25uUe9ZSydEctIAzpGjWnki42sDdbcE6oCx0fi4rY08zVKmRObVyTn9dF91HD6e89kCZ8eeeGPhm7iBZcWTHcdl933Zlo2ClgkgACrylcfjIhveLT4XtVnBRDTyurvHG521IerMdC8sTUuIu73wiLr0+ebtPvowq46EEwT6qXHqxm1c+i+D4ByrSpsgqFNQhLUZPEsu59p64MMr3QrhTRYBPal+Iawtz8vQzmaSvYLEyuzxTn2Neh1dKT8ntgYcv5j9Ng/K+/pBOOw4mSJ6FzNtYOgTtQNNAPMwyIaWOslYnsAU0uJ+1lgluPB2aDu4OM4RwzpSEa6bYrWtZi8dKdZFqaEOamCenWR4OOhlHakaLH9vRRFEKu9RZaf0ci1vA7H8rBr2QKb4gP6k+rIkX6xtLBC48pL76ieNbgm1O90VBH0u03/zGc= X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On Tue, Dec 10, 2024 at 1:05=E2=80=AFAM Michal Hocko wrot= e: > > On Tue 10-12-24 05:31:30, Matthew Wilcox wrote: > > On Mon, Dec 09, 2024 at 06:39:31PM -0800, Alexei Starovoitov wrote: > > > + if (preemptible() && !rcu_preempt_depth()) > > > + return alloc_pages_node_noprof(nid, > > > + GFP_NOWAIT | __GFP_ZERO, > > > + order); > > > + return alloc_pages_node_noprof(nid, > > > + __GFP_TRYLOCK | __GFP_NOWARN | __G= FP_ZERO, > > > + order); > > > > [...] > > > > > @@ -4009,7 +4018,7 @@ gfp_to_alloc_flags(gfp_t gfp_mask, unsigned int= order) > > > * set both ALLOC_NON_BLOCK and ALLOC_MIN_RESERVE(__GFP_HIGH). > > > */ > > > alloc_flags |=3D (__force int) > > > - (gfp_mask & (__GFP_HIGH | __GFP_KSWAPD_RECLAIM)); > > > + (gfp_mask & (__GFP_HIGH | __GFP_KSWAPD_RECLAIM | __GFP_TR= YLOCK)); > > > > It's not quite clear to me that we need __GFP_TRYLOCK to implement this= . > > I was originally wondering if this wasn't a memalloc_nolock_save() / > > memalloc_nolock_restore() situation (akin to memalloc_nofs_save/restore= ), > > but I wonder if we can simply do: > > > > if (!preemptible() || rcu_preempt_depth()) > > alloc_flags |=3D ALLOC_TRYLOCK; > > preemptible is unusable without CONFIG_PREEMPT_COUNT but I do agree that > __GFP_TRYLOCK is not really a preferred way to go forward. For 3 > reasons. > > First I do not really like the name as it tells what it does rather than > how it should be used. This is a general pattern of many gfp flags > unfotrunatelly and historically it has turned out error prone. If a gfp > flag is really needed then something like __GFP_ANY_CONTEXT should be > used. If the current implementation requires to use try_lock for > zone->lock or other changes is not an implementation detail but the user > should have a clear understanding that allocation is allowed from any > context (NMI, IRQ or otherwise atomic contexts). __GFP_ANY_CONTEXT would make sense if we wanted to make it available for all kernel users. In this case I agree with Sebastian. This is bpf specific feature, since it doesn't know the context. All other kernel users should pick GFP_KERNEL or ATOMIC or NOWAIT. Exposing GFP_ANY_CONTEXT to all may lead to sloppy code in drivers and elsewhere. > Is there any reason why GFP_ATOMIC cannot be extended to support new > contexts? This allocation mode is already documented to be usable from > atomic contexts except from NMI and raw_spinlocks. But is it feasible to > extend the current implementation to use only trylock on zone->lock if > called from in_nmi() to reduce unexpected failures on contention for > existing users? No. in_nmi() doesn't help. It's the lack of reentrance of slab and page allocator that is an issue. The page alloctor might grab zone lock. In !RT it will disable irqs. In RT will stay sleepable. Both paths will be calling other kernel code including tracepoints, potential kprobes, etc and bpf prog may be attached somewhere. If it calls alloc_page() it may deadlock on zone->lock. pcpu lock is thankfully trylock already. So !irqs_disabled() part of preemptible() guarantees that zone->lock won't deadlock in !RT. And rcu_preempt_depth() case just steers bpf into try lock only path in RT. Since there is no way to tell whether it's safe to call sleepable spin_lock(&zone->lock). > > Third, do we even want such a strong guarantee in the generic page > allocator path and make it even more complex and harder to maintain? I'm happy to add myself as R: or M: for trylock bits, since that will be a fundamental building block for bpf. > We > already have a precence in form of __alloc_pages_bulk which is a special > case allocator mode living outside of the page allocator path. It seems > that it covers most of your requirements except the fallback to the > regular allocation path AFAICS. Is this something you could piggy back > on? __alloc_pages_bulk() has all the same issues. It takes locks. Also it doesn't support GFP_ACCOUNT which is a show stopper. All bpf allocations are going through memcg.