From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 73FA8C52D6F for ; Tue, 27 Aug 2024 07:50:39 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id CEA5F6B007B; Tue, 27 Aug 2024 03:50:38 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id C99E96B0082; Tue, 27 Aug 2024 03:50:38 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id B619D6B0083; Tue, 27 Aug 2024 03:50:38 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0016.hostedemail.com [216.40.44.16]) by kanga.kvack.org (Postfix) with ESMTP id 977786B007B for ; Tue, 27 Aug 2024 03:50:38 -0400 (EDT) Received: from smtpin22.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay09.hostedemail.com (Postfix) with ESMTP id 165A381748 for ; Tue, 27 Aug 2024 07:50:38 +0000 (UTC) X-FDA: 82497253356.22.8AB0E51 Received: from mail-vk1-f182.google.com (mail-vk1-f182.google.com [209.85.221.182]) by imf14.hostedemail.com (Postfix) with ESMTP id 47E8F100002 for ; Tue, 27 Aug 2024 07:50:36 +0000 (UTC) Authentication-Results: imf14.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b=OunurA9P; dmarc=pass (policy=none) header.from=gmail.com; spf=pass (imf14.hostedemail.com: domain of 21cnbao@gmail.com designates 209.85.221.182 as permitted sender) smtp.mailfrom=21cnbao@gmail.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1724744939; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=6wgvwFwY07qfV5rJ9U1zenMbeFlVV3ehDJ0sYX1Z8kw=; b=GwqwcCXozUqVadq1JIESTqgLjSYLlMvv4gxTDyKLt8Cvm/YJ3MFcq94Nw0fgjR8pl923bC 6uRkkwGKwos0d0GDobZoPM5CdWDJkvIkj/wb2UucKiCc4UQt1rWWa9UASLizAt61ipgCdh wJqioxbkrJh+P5TWbM942FDJzEn5JgY= ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1724744939; a=rsa-sha256; cv=none; b=0vgJZ0y9E/LQnbubgIWj7TSVEAx6QtQFHxrHCtdjponVhqIgq82MPY4GeyCe1WJe+P/np/ 3qFY5pglvNZ5+JQSwo/rfQFdzLO1Kacre0MYCXbeXBm+N4+TrGhIQunhhrIIMk/ZLRnUlo +o+OhW/1URKWNL57g4RZKFS2KfLMrX0= ARC-Authentication-Results: i=1; imf14.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b=OunurA9P; dmarc=pass (policy=none) header.from=gmail.com; spf=pass (imf14.hostedemail.com: domain of 21cnbao@gmail.com designates 209.85.221.182 as permitted sender) smtp.mailfrom=21cnbao@gmail.com Received: by mail-vk1-f182.google.com with SMTP id 71dfb90a1353d-4fce23b0e32so1766930e0c.3 for ; Tue, 27 Aug 2024 00:50:36 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1724745035; x=1725349835; darn=kvack.org; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=6wgvwFwY07qfV5rJ9U1zenMbeFlVV3ehDJ0sYX1Z8kw=; b=OunurA9Po1BQsXkQ9uR6TDxxbVpFsAbRA1RLmoIhpyorqfpHiH35rU9POyt1SCc4gR HnAdMLwwDBU2zxdRI0Y8pQKLAV7AQg3gCZ8rszT6OyXQ1TACROp8/zG5PzW0FWVnUDhD mLSJ+TzDXCFXLBDc4R8Wp+yzQC8BilEEDAlJb3lhdfAxihV1GUtsZM9KuWERQmN/MPBP a5lnuGuoxjoRsU1yC85ybzClxStAn7UFPHQPG3NGNWJFyMdcEUgorkMs6CzwAu52WT7j JH/hN1hgDhGuWx0yV1dRFx9kOfJmD+uKtBLNMJ4m3+VjNwwBnsvXfsfgP51jI/zh4XLE hp7w== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1724745035; x=1725349835; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=6wgvwFwY07qfV5rJ9U1zenMbeFlVV3ehDJ0sYX1Z8kw=; b=ZpmxFrEfa/LNM6t0dwDGhjHek1nZOuOESUZnIf85cQG3Rn6dMS7rDJr6LKmkdQGVKE j+6ox70VXTaAPgFOBzAtkwDF+o4e1bKnWUevcPOE6zz/FeEQ1afkoRVz911z9a2vCxs1 x36ozBFnSAZKeEKD5bwjH6mZ7JTNjHDyTDy+YiGQZdSgJn6vDvi/VlBglDxdyccWA91x PrX6iRAXWE9GsD0SIZr5IPUt6na2Xs/QB4VHkdpChHo+qIFMP8KJMO8zt7kQP+NGxglN yGRc4d0NHOO2zVgSI2Z2uk06szRo6WVeU8Q65qHV6zUkvxZlAAFOrgjqeXztEAxi/ngv /c/g== X-Forwarded-Encrypted: i=1; AJvYcCUCIP4thRKvl1vSuw+a7PhTN0Fgl/RXpSyf+3/DQqL7mli9CYyaCPPFY+y9LFnXael7Srm40pgagw==@kvack.org X-Gm-Message-State: AOJu0YwzZkb+iShEuEbvT40MjIIP65x3fF8G6s5sP1xkV1io5KWb1lvm VdqgFewCSvHzvV/cfne4gx2wRFoQ4RqWbU3vq2KFtQ0gfJyFKwsD+KJCIfh8mOkaBHlAvbb6QYv nvf/G7mj5jHwt8ZO7O1ZmNUy87WQ= X-Google-Smtp-Source: AGHT+IG3l2I8ry4NFwxm4gb/HF0UoyCqgrF6iHGQmn8/QN7HV+MQBXm6HcSR2hyeA4dKqBSXGkiPf8YKtKERe3F5fTA= X-Received: by 2002:a05:6122:2212:b0:4fc:da8f:c8a8 with SMTP id 71dfb90a1353d-4fed5d66cadmr2779461e0c.1.1724745035099; Tue, 27 Aug 2024 00:50:35 -0700 (PDT) MIME-Version: 1.0 References: <59e90825-4efa-4384-8286-06c0235304dc@redhat.com> In-Reply-To: From: Barry Song <21cnbao@gmail.com> Date: Tue, 27 Aug 2024 19:50:23 +1200 Message-ID: Subject: Re: [PATCH v3 0/4] mm: clarify nofail memory allocation To: Vlastimil Babka Cc: Linus Torvalds , David Hildenbrand , Michal Hocko , Yafang Shao , akpm@linux-foundation.org, linux-mm@kvack.org, 42.hyeyoo@gmail.com, cl@linux.com, hailong.liu@oppo.com, hch@infradead.org, iamjoonsoo.kim@lge.com, penberg@kernel.org, rientjes@google.com, roman.gushchin@linux.dev, urezki@gmail.com, v-songbaohua@oppo.com, virtualization@lists.linux.dev Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Rspamd-Server: rspam07 X-Rspamd-Queue-Id: 47E8F100002 X-Stat-Signature: imjbuh1bqb3gu4zk897x9pxa7n8suc34 X-Rspam-User: X-HE-Tag: 1724745036-903630 X-HE-Meta: U2FsdGVkX198xb6J49s/jQQU/t9fG/Xc9OZTr6L2whhlBqewasxA3sQEIMkcIDTTSzTvoRQxg89/EicsH4edNdNHNIT96G9kKnuhoKpoGt7WSI76DdsYI/cRKDypBv4muTen9BUsynCRNfe/9BHyHug00CXczMTI7VlXtbYsylagFLpl3JyZdHVyTRCxV/SYbQ09IViczm6xROFKf/iqKRXoQsHlC6u7A9vLsFqJv32QYww7sqGZsyUB5isRth7Gpe6zXQRI0aGFeQYdUcY75i4tfdPdG5ZvqTXW+1TcF5I0sAPesDW/VpRYPFt1QRpwWbwNMPnYoOGa9fFolO9Bvc2xvL6ESbW6K5hCUxy20vAJPwxtKkbDqMxwFvnhyZg+SfLcLMDpJyMUx5BcuZ7Mpemm5JmbQE0ZR/2Xy9qH/cuIk/sLTMxyG1XmJuSdngWXdRzkKYX4TuMhDhgLQaSmFRAepXeF5/ukRiNrFXfArDcLs+zQdKAb2vT9C8hsG4JDuMWwn/XJCFNpSBgRUUjZD5PCBBGttGrtIQQMeqN4maKxnyddQLLW99ImaVC5ZoncESECEOH23+LQKQqAzG6c221DnmgsMxJq+dkMGFcXLLsTgfHb+uq/XxsaF3XDncnHWPjCRZ/G2C2kuXyewyKkKflhovM5R4s69ThdDO2Jpe+PRXRfR4GJ5Lh1IC206zsN01iyq1CoUav15nyZ0MgwIEKE236nzFv0cfLiD4RLpYfPgPcHuSF9jxMlaaF8WdcU0B3Ni2UATlnexol2y1PC8YkZIYWSnO9/lO/YaH7hc4saRyNjwbLXM6htvRxhi4lBAlTSTgJ+Sy/8d9QCULiSoXBNGS0ax3jbCB9gFSOTikSan5NkqINgTs++ctmHameDYv0S9t9hUvpm4G8jY7SS51lqxf0xQqXRRsZVkOXSj4kgGJL8HthTEQjZwKqskCQWZyZyxHTYdHHMUMbgQou VUnioL98 tSQOeEfgc7usx9DCuZTzTJyP6l5IHK30ipivbGan+kXHiuw5lS+hRtGntjM+/GOnVPizYqje4KbnYewkdlO3AdWan+inRg/5PkVXQxg5t6116qH9iHcGM4miDMuuVg9ZF3PFaEWGnJCvFfNPCI8DXOBA41KKql1J/SJjCt7E2hUxa8YZn6Agh6Mi9hAwpGiI5TWXeg0rkuzArd9v+/pfOzoL+lOKCaOzkm1q1uMvkANbpbvD+SUbC+ECK8JhXFDhAw/+pUMszDho1lOZdrHY3deuU/oGzb+ulOdovFpmDxlyiZE+TaWRm/Xd7vjqVqcF7Fyw6wYkpOjC2wYUNE1dXbGoO9uyudXK0tRRZLSub8yiHwd4fTJczPo9sUgiimqrbUKBfjHh/2Wc36g8= X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On Tue, Aug 27, 2024 at 7:38=E2=80=AFPM Vlastimil Babka wr= ote: > > On 8/27/24 09:15, Barry Song wrote: > > On Tue, Aug 27, 2024 at 12:10=E2=80=AFAM Vlastimil Babka wrote: > >> > >> On 8/22/24 11:34, Linus Torvalds wrote: > >> > On Thu, 22 Aug 2024 at 17:27, David Hildenbrand w= rote: > >> >> > >> >> To me, that implies that if you pass in MAX_ORDER+1 the VM will "re= try > >> >> infinitely". if that implies just OOPSing or actually be in a busy = loop, > >> >> I don't care. It could effectively happen with MAX_ORDER as well, a= s > >> >> stated. But certainly not BUG_ON. > >> > > >> > No BUG_ON(), but also no endless loop. > >> > > >> > Just return NULL for bogus users. Really. Give a WARN_ON_ONCE() to > >> > make it easy to find offenders, and then let them deal with it. > >> > >> Right now we give the WARN_ON_ONCE() (for !can_direct_reclaim) only wh= en > >> we're about to actually return NULL, so the memory has to be depleted > >> already. To make it easier to find the offenders much more reliably, w= e > >> should consider doing it sooner, but also not add unnecessary overhead= to > >> allocator fastpaths just because of the potentially buggy users. So ei= ther > >> always in __alloc_pages_slowpath(), which should be often enough (unle= ss the > >> system never needs to wake up kswapd to reclaim) but with negligible e= nough > >> overhead, or on every allocation but only with e.g. CONFIG_DEBUG_VM? > > > > We already have a WARN_ON for order > 1 in rmqueue. we might extend > > the condition there to include checking flags as well? > > Ugh, wasn't aware, well spotted. So it means there at least shouldn't be > existing users of __GFP_NOFAIL with order > 1 :) > > But also the check is in the hotpath, even before trying the pcplists, so= we > could move it to __alloc_pages_slowpath() while extending it? Agreed. I don't think it is reasonable to check the order and flags in two different places especially rmqueue() has already had gfp_flags & __GFP_NOFAIL operation and order > 1 overhead. We can at least extend the current check to make some improvement though I still believe Michal's suggestion of implementing OOPS_ON is a better approach to pursue, as it doesn't crash the entire system while ensuring the problematic process is terminated. > > > diff --git a/mm/page_alloc.c b/mm/page_alloc.c > > index 7dcb0713eb57..b5717c6569f9 100644 > > --- a/mm/page_alloc.c > > +++ b/mm/page_alloc.c > > @@ -3071,8 +3071,11 @@ struct page *rmqueue(struct zone *preferred_zone= , > > /* > > * We most definitely don't want callers attempting to > > * allocate greater than order-1 page units with __GFP_NOFAIL. > > + * Also we don't support __GFP_NOFAIL without __GFP_DIRECT_RECLAIM, > > + * which can result in a lockup > > */ > > - WARN_ON_ONCE((gfp_flags & __GFP_NOFAIL) && (order > 1)); > > + WARN_ON_ONCE((gfp_flags & __GFP_NOFAIL) && > > + (order > 1 || !(gfp_flags & __GFP_DIRECT_RECLAIM))); > > > > if (likely(pcp_allowed_order(order))) { > > page =3D rmqueue_pcplist(preferred_zone, zone, order, > > > >> > >> > Don't take it upon yourself to say "we have to deal with any amount = of > >> > stupidity". > >> > > >> > The MM layer is not some slave to users. The MM layer is one of the > >> > most core pieces of code in the kernel, and as such the MM layer is > >> > damn well in charge. > >> > > >> > Nobody has the right to say "I will not deal with allocation > >> > failures". The MM should not bend over backwards over something like > >> > that. > >> > > >> > Seriously. Get a spine already, people. Tell random drivers that cla= im > >> > that they cannot deal with errors to just f-ck off. > >> > > >> > And you don't do it by looping forever, and you don't do it by killi= ng > >> > the kernel. You do it by ignoring their bullying tactics. > >> > > >> > Then you document the *LIMITED* cases where you actually will try fo= rever. > >> > > >> > This discussion has gone on for too damn long. > >> > > >> > Linus > >> > Thanks Barry