From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 8946AC52D7C for ; Sun, 18 Aug 2024 06:27:57 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 930DD8D00DA; Sun, 18 Aug 2024 02:27:56 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 8DDF58D00B8; Sun, 18 Aug 2024 02:27:56 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 780018D00DA; Sun, 18 Aug 2024 02:27:56 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0016.hostedemail.com [216.40.44.16]) by kanga.kvack.org (Postfix) with ESMTP id 5B2C58D00B8 for ; Sun, 18 Aug 2024 02:27:56 -0400 (EDT) Received: from smtpin12.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay04.hostedemail.com (Postfix) with ESMTP id 82C111A0A64 for ; Sun, 18 Aug 2024 06:27:55 +0000 (UTC) X-FDA: 82464385710.12.A527F7C Received: from mail-vk1-f169.google.com (mail-vk1-f169.google.com [209.85.221.169]) by imf08.hostedemail.com (Postfix) with ESMTP id AEE25160002 for ; Sun, 18 Aug 2024 06:27:53 +0000 (UTC) Authentication-Results: imf08.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b=EK3L+qBn; spf=pass (imf08.hostedemail.com: domain of 21cnbao@gmail.com designates 209.85.221.169 as permitted sender) smtp.mailfrom=21cnbao@gmail.com; dmarc=pass (policy=none) header.from=gmail.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1723962435; a=rsa-sha256; cv=none; b=IsbndmDq6EnheX6vSsgCHBV5jz9wdOt/YyaLbNcj1HOikKy6xNipDqkMGTQIPM8wueBcD8 IArRm83RSdP19krUsAaYY/MlhXYUs8X0HIbKqjYhYYzIoKZ4cBfXnR7uvTwypeIY/XMDB1 YEbHjOTj9we4cRYsSXAwCA9FQLEOdqE= ARC-Authentication-Results: i=1; imf08.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b=EK3L+qBn; spf=pass (imf08.hostedemail.com: domain of 21cnbao@gmail.com designates 209.85.221.169 as permitted sender) smtp.mailfrom=21cnbao@gmail.com; dmarc=pass (policy=none) header.from=gmail.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1723962435; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=JgUO+C1zcLo56ERlZysfiwaIZDpxEmG9haOWGH4e9sU=; b=iJBUPZukPZpuE4cEBxjDr25ekUHzZ3B7yEEAugGsolCg3pBOhvzp006zAR4MFwYaGjgR+A f/21iQ/hgdzFLER5NPtQD2ckACDET/GSQf8AGDAq6xmUqLxyw0uVgmfqNuf1lpyS6LoFOd x83lIAuWLMBNmFKiydLDMBKfrkncexQ= Received: by mail-vk1-f169.google.com with SMTP id 71dfb90a1353d-4f6d35d59ccso1005351e0c.1 for ; Sat, 17 Aug 2024 23:27:53 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1723962473; x=1724567273; darn=kvack.org; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=JgUO+C1zcLo56ERlZysfiwaIZDpxEmG9haOWGH4e9sU=; b=EK3L+qBnjmjkJur78H8Tb8wYlDhLFIIlL5gs/+VgPYb4KKxVqjld6Sys3TsTm5sUEe ESWPxBuc8kWn90liJJAQcrG8C0frCElrgtGzuGTjTeJy7hKGSwojj/A7K5XSMGVgW1V6 fevTWtECRCxoW9wDYmjaQ0dQmmgvNmA375pvrZ14hGZ3y/fEaVFkOrG1V40eyA29VeAS plgk0Ne7O0k6/dEGv+dPEvyuJyAWO/i/8s4vS/CEy0XZ3GFectugJlYLKxWUMaLFBmyl U5Na98YIU1Srnqkpjs1eWKKeScqLYU0680ay2p7FgrOtGcdURoWiMS1ai2moNwJUQr9a Rrew== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1723962473; x=1724567273; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=JgUO+C1zcLo56ERlZysfiwaIZDpxEmG9haOWGH4e9sU=; b=q1jJEKnC7xn1mY/6c0R9qntime/I4KR5kEfTzeYndh7x3yY6SSzxDh9Uom9tyPujeK /PD0X36L9Le+I0XP8QVNTQe7TC6p+DkNqTLSV08uqhLCWOM1WqWTPLAq1Y9FXDPmq4Jm STZ7JXMME7OIFlq8K2pRXivq3Qgh6vaC2ckXHwiaFCE6RAGKkBrlu8o0443g6sz8LVkW R2ffa1T0z2O3u80eN7D4I8k+isuRkS9Sh3jFv025aL2E6oZXgJL3APnQk41BVc+rZAxf m0swMANhEwtv0vkhKLPBTJpcEj5Ktj1rl3yt+tfQ2yZE+d1y/3SXHmfZaFDEjRp0ZqCs kwDA== X-Forwarded-Encrypted: i=1; AJvYcCXaB+5RFndNbDVYCChaafjsKMA8KBwrf8cXoAFJ68AoQGjAB9AEIESt7BQNBlXDpoxrU6RMzcICo6VJeWy7E5lZneg= X-Gm-Message-State: AOJu0YyLO/wRBTGL4OFP5Mzlmkaw5or8NWzO1sAqhcCgbGkoCrmUTT3D Y8/wl+n3sAKJz4P02Q56KWzrn5r3mAVC2ii3xCcN0rVQg2wTSEEJDE5kIRFi9Z1b9UWNaITT/+4 AEpv+8ow+w9oxhvtzBk0UXNxIZ9s= X-Google-Smtp-Source: AGHT+IE0az6Fv9KhSH8dZbqFWRVPh5tHPlz3b8uITSy7NR/ejGXgglXgk7rO4mWUrhmALYMSWmI1Rdg+8t4bDTfgS/g= X-Received: by 2002:a05:6122:d99:b0:4f6:ca2:ad0 with SMTP id 71dfb90a1353d-4fc6c5b6980mr6322928e0c.1.1723962472556; Sat, 17 Aug 2024 23:27:52 -0700 (PDT) MIME-Version: 1.0 References: <20240817062449.21164-1-21cnbao@gmail.com> <20240817062449.21164-5-21cnbao@gmail.com> In-Reply-To: From: Barry Song <21cnbao@gmail.com> Date: Sun, 18 Aug 2024 18:27:41 +1200 Message-ID: Subject: Re: [PATCH v3 4/4] mm: prohibit NULL deference exposed for unsupported non-blockable __GFP_NOFAIL To: Yafang Shao Cc: akpm@linux-foundation.org, linux-mm@kvack.org, 42.hyeyoo@gmail.com, cl@linux.com, hailong.liu@oppo.com, hch@infradead.org, iamjoonsoo.kim@lge.com, mhocko@suse.com, penberg@kernel.org, rientjes@google.com, roman.gushchin@linux.dev, torvalds@linux-foundation.org, urezki@gmail.com, v-songbaohua@oppo.com, vbabka@suse.cz, virtualization@lists.linux.dev, Lorenzo Stoakes , Kees Cook , =?UTF-8?Q?Eugenio_P=C3=A9rez?= , Jason Wang , Maxime Coquelin , "Michael S. Tsirkin" , Xuan Zhuo Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Stat-Signature: muu7dsdbd38jt1qtfmi3y6gtfkoakrtt X-Rspamd-Queue-Id: AEE25160002 X-Rspam-User: X-Rspamd-Server: rspam10 X-HE-Tag: 1723962473-637181 X-HE-Meta: U2FsdGVkX19ow+A4vaGGgBj01Csx2V5ulxk/6nUap2ly/AZBjjQEheFpMe6w19nLPQBo1dw+lUEhCxt06mJzFOMd9u6THL3t18DVh8bcaxXMi2L0YwVo3pM9e4J2HB1rqW2WpCTgKjkDAHqdnIBt8GkAZSRQYdKUY3+V6GvgncUXjkhyQWW8uJI7FM9bgb80yJqE5JcXXz+CmSAkgs9rRHqtWA6Sn5q9pT3Oj6Jo/EWODQZRrzQACSszFfTM3uNPuCJ5pD2Q1CwtbouR2y1CHYu9h8sgUV/0ei8RATlQo42wfDJifCF9hLa/uvwZ68ORMDLw8eo+mEnvKt7i0afPObxlFCIdxyMYTBS4rbNyqC9SWKYFHxLRMR7vNRqZeS05jk0xYgK6/Oo8xz5I7qzw37jTbH9gXbfskvAC9Kd/JVzvuhEB+xxki41YZr7KS1o9VI32lKf3M1Ou7tD73tr9N3V97eF+2jTGjxySy+7f70qcAV1oLAJNUa/de9+hZgGzSLcCPA9u9Pl8PAPwzL2Y6+BDRyxTdR1qgMW+WZpG6UkMwiWxLKwbTRJYxopEJxv81eURk5PwPCWYMcHZaCpvGxp8LcokMmKL+8loANh24vNox7ZDYQzrB8DTIR9C9bYYeaAOWkYT66xalHEInE393qvzo0RTas8G33HNeXROWopTPhUHI173wjyGmJQ8k6clt4otauWqm24GjEvfoXATH13zbsxt9QIGprKdrlnUBX6nL7N+Q3uvluGhAe2gtpgcauYNgKJIIqIevv7Aoo70zhm/je/AV/JcDjaWXev4XiK7jPrs2yGNrXyDy9XI0dcx18FNu2z2vFv1GaXv3MOw+WWWSMmjOaN2iLC6ToaSTyUo4npRwFD2a3ZSQ60D2/gK5iqhc1MoheZGekLLltmxBktL8FRi5xL9BFSng/QaGHSNtQ2kBYwae0RmjwZyykBXQbvEHXdsX6YQheCPbhc Lgs6S6Rg tSXrxB8DcDQHAbkJ3rXlC92lDrHDIoeeonqnWylBPd2gRWl2YMYtjnEDAWWH7LlTTql5PD/QqTbjlWFkOC0cEMNa/CEWV+OADEWbQEyPAKJeaOhCZXoTsmqvKOvzMndHYgkbSE7gerl/LUHcnFfAxLXYUxEzym9ccPq7jY6JBw2k6oEZpjfUMSkdsQi8JtQ6hWDhd2KIBJZ/APK4xIwfAdg/VKPH455aG46pfwkOnLDlSlbWQlJX83r9NtxDMaJrNCft0jyzMcAU6+T4vZY4Y4r0LZ2zauDGBfUxSKsyeEDctJpUjF1QO/Cj9uTEvFajBHcufuaIv9Z1/rZ+rKNP++jmZDQ1c29IG0M6oRAJmHKg8plH+1Y3MDtE0/AoL5k4AHZIzsPdp/4gPjoOrZ4MnI8HUPQ40FqPJ2guFzyDUqGoovqslGTc9a1T6wvZE1ttNEmTVDjgH+QoBU/aC1h+mOamIJ/vxAR17Hcvg2p85n8vj79dumXRBDmjusl02+Wuz51U+YF0LtIqhEba093RxwmGnj6i8zlUtBrRXyrCO87au1qGvE0aQFbtZmM7qKkT+H0uDNWUt1PSKC/O+AIRfZ7dVZsKUheX0N55l0prLPMyChql5yCcs/2/Xrj5bSv6CvUd6yF8+Siti3FFwbViiLInKi4XYnQsf5kOw01idMVyV6gWG739HHOy8Mszt0fJ8Fc4dJ3hO0J28kj6FAoTrh8VaJuxTGsOfu77jyGEPu1xgm4am1RCTRHmu5q0fBDEbTuIh+LgLMhCgi3tz+zR4J/oqX4vCoQ5N8NzScu2m5SJY4b0yKAeiIspMYI4SPmpG7YVVbK+5WDAjk7DWN3d16N1TKhT0lvPPVcNw/xjFaQmecKdGpLsOSLY8HBdrchh5nOv6gY8L2d3/57QATGPqpCKS227cbwwtVUtCsnKFS8KF0bOwGlAs5V0tGaUb16OnvJaH X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On Sun, Aug 18, 2024 at 5:51=E2=80=AFPM Yafang Shao = wrote: > > On Sun, Aug 18, 2024 at 11:48=E2=80=AFAM Barry Song <21cnbao@gmail.com> w= rote: > > > > On Sun, Aug 18, 2024 at 2:55=E2=80=AFPM Yafang Shao wrote: > > > > > > On Sat, Aug 17, 2024 at 2:25=E2=80=AFPM Barry Song <21cnbao@gmail.com= > wrote: > > > > > > > > From: Barry Song > > > > > > > > When users allocate memory with the __GFP_NOFAIL flag, they might > > > > incorrectly use it alongside GFP_ATOMIC, GFP_NOWAIT, etc. This kin= d of > > > > non-blockable __GFP_NOFAIL is not supported and is pointless. If w= e > > > > attempt and still fail to allocate memory for these users, we have = two > > > > choices: > > > > > > > > 1. We could busy-loop and hope that some other direct reclamati= on or > > > > kswapd rescues the current process. However, this is unreliable > > > > and could ultimately lead to hard or soft lockups, > > > > > > That can occur even if we set both __GFP_NOFAIL and > > > __GFP_DIRECT_RECLAIM, right? So, I don't believe the issue is related > > > to setting __GFP_DIRECT_RECLAIM; rather, it stems from the flawed > > > design of __GFP_NOFAIL itself. > > > > the point of GFP_NOFAIL is that it won't fail and its user won't check > > the return value. without direct_reclamation, it is sometimes impossibl= e. > > but with direct reclamation, users constantly wait and finally they can > > So, what exactly is the difference between 'constantly waiting' and > 'busy looping'? Could you please clarify? Also, why can't we > 'constantly wait' when __GFP_DIRECT_RECLAIM is not set? I list two options in changelog 1: busy loop 2. bug_on. I am actually fine with either one. either one is better than the existing code. but returning null in the current code is definitely wrong. 1 somehow has the attempt to make __GFP_NOFAIL without direct_reclamation legal. so it is a bit suspicious going in the wrong direction. busy-loop is that you are not reclaiming memory you are not sleeping. cpu is constantly working and busy, so it might result in a lockup, either soft lockup or hard lockup. with direct_reclamation, wait is the case you can sleep. it is not holding cpu, not a busy loop. in rare case, users might end in endless wait, but it matches the doc of __GFP_NOFAIL, never return till memory is gotten (the current code is implemented in this way unless users incorrectly combine __GFP_NOFAIL with aotmic/nowait etc.) note, long-term we won't expose __GFP_NOFAIL any more. we will only expose GFP_NOFAIL which enforces Blockable. I am quite busy on other issues, so this won't happen in a short time. > > > get memory. if you read the doc of __GFP_NOFAIL you will find it. > > it is absolutely clearly documented. > > > > > > > > > which might not > > > > be well supported by some architectures. > > > > > > > > 2. We could use BUG_ON to trigger a reliable system crash, avoi= ding > > > > exposing NULL dereference. > > > > > > > > Neither option is ideal, but both are improvements over the existin= g code. > > > > This patch selects the second option because, with the introduction= of > > > > scoped API and GFP_NOFAIL=E2=80=94capable of enforcing direct recla= mation for > > > > nofail users(which is in my plan), non-blockable nofail allocations= will > > > > no longer be possible. > > > > > > > > Signed-off-by: Barry Song > > > > Cc: Michal Hocko > > > > Cc: Uladzislau Rezki (Sony) > > > > Cc: Christoph Hellwig > > > > Cc: Lorenzo Stoakes > > > > Cc: Christoph Lameter > > > > Cc: Pekka Enberg > > > > Cc: David Rientjes > > > > Cc: Joonsoo Kim > > > > Cc: Vlastimil Babka > > > > Cc: Roman Gushchin > > > > Cc: Hyeonggon Yoo <42.hyeyoo@gmail.com> > > > > Cc: Linus Torvalds > > > > Cc: Kees Cook > > > > Cc: "Eugenio P=C3=A9rez" > > > > Cc: Hailong.Liu > > > > Cc: Jason Wang > > > > Cc: Maxime Coquelin > > > > Cc: "Michael S. Tsirkin" > > > > Cc: Xuan Zhuo > > > > --- > > > > mm/page_alloc.c | 10 +++++----- > > > > 1 file changed, 5 insertions(+), 5 deletions(-) > > > > > > > > diff --git a/mm/page_alloc.c b/mm/page_alloc.c > > > > index d2c37f8f8d09..fb5850ecd3ae 100644 > > > > --- a/mm/page_alloc.c > > > > +++ b/mm/page_alloc.c > > > > @@ -4399,11 +4399,11 @@ __alloc_pages_slowpath(gfp_t gfp_mask, unsi= gned int order, > > > > */ > > > > if (gfp_mask & __GFP_NOFAIL) { > > > > /* > > > > - * All existing users of the __GFP_NOFAIL are block= able, so warn > > > > - * of any new users that actually require GFP_NOWAI= T > > > > + * All existing users of the __GFP_NOFAIL are block= able > > > > + * otherwise we introduce a busy loop with inside t= he page > > > > + * allocator from non-sleepable contexts > > > > */ > > > > - if (WARN_ON_ONCE_GFP(!can_direct_reclaim, gfp_mask)= ) > > > > - goto fail; > > > > + BUG_ON(!can_direct_reclaim); > > > > > > I'm not in favor of using BUG_ON() here, as many call sites already > > > handle the return value of __GFP_NOFAIL. > > > > > > > it is not correct to handle the return value of __GFP_NOFAIL. > > if you check the ret, don't use __GFP_NOFAIL. > > If so, you have many code changes to make in the linux kernel ;) > Please list those code using __GFP_NOFAIL and check the result might fail, we should get them fixed. This is insane. NOFAIL means no fail. > > > > > If we believe BUG_ON() is necessary, why not place it at the beginnin= g > > > of __alloc_pages_slowpath() instead of after numerous operations, > > > which could potentially obscure the issue? > > > > to some extent I agree with you. but the point here is that we might > > want to avoid this check in the hot path. so basically, we check when > > we have to check. in 99%+ case, this check can be avoided. > > It's on the slow path, but that's not the main point here. I actually recommended the approach, we can do an earlier check in the hotp= ath. somehow, in the previous discussion, people didn't like it. > > -- > Regards > Yafang Thanks barry