From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 5943FC3DA4A for ; Thu, 22 Aug 2024 14:23:27 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 979EE6B02CF; Thu, 22 Aug 2024 10:23:26 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 925E66B02D0; Thu, 22 Aug 2024 10:23:26 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 7C6E66B02D1; Thu, 22 Aug 2024 10:23:26 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0017.hostedemail.com [216.40.44.17]) by kanga.kvack.org (Postfix) with ESMTP id 5C6B06B02CF for ; Thu, 22 Aug 2024 10:23:26 -0400 (EDT) Received: from smtpin08.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay01.hostedemail.com (Postfix) with ESMTP id CF4291C5827 for ; Thu, 22 Aug 2024 14:23:25 +0000 (UTC) X-FDA: 82480099170.08.EF96F59 Received: from mail-qv1-f53.google.com (mail-qv1-f53.google.com [209.85.219.53]) by imf22.hostedemail.com (Postfix) with ESMTP id D663DC0014 for ; Thu, 22 Aug 2024 14:23:23 +0000 (UTC) Authentication-Results: imf22.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b=aNn+N08P; spf=pass (imf22.hostedemail.com: domain of laoar.shao@gmail.com designates 209.85.219.53 as permitted sender) smtp.mailfrom=laoar.shao@gmail.com; dmarc=pass (policy=none) header.from=gmail.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1724336522; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=cMaAuY7A8lo8j+oYYWlt54g77ZLGhGOJF8yf2vy/BSs=; b=wQp5sV2PB1PrwFqC6albiandrwHxvFWE67xirX7F1+DvL8a9O1nCnQCfsg9f8+uxLRjEBU oQnWcwzz+MmPn1j0URcjSIbOQlvRyzplLmlWjaVM6gidEfOQj505cI1ziS6Yci2Z3y+x7z T6aEW3iZKpUPYcx/X1WICQG9XIfv1Zo= ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1724336522; a=rsa-sha256; cv=none; b=76PHSpla9x+7o9tSLoCfRwwlYEzdiB04n8efg0zqth3ZqNQNlIc1sRP0NNZ9wM7j9l1RXu BVmWystiCqNbtXO3oMbzmOY2+vcc3MY1tR2NMlB5ipuFh3i0jtVVD23Rn+D06D5ct/hwhP 2grM46yp4DZl4vK5OWk4pU90CPFpVQ8= ARC-Authentication-Results: i=1; imf22.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b=aNn+N08P; spf=pass (imf22.hostedemail.com: domain of laoar.shao@gmail.com designates 209.85.219.53 as permitted sender) smtp.mailfrom=laoar.shao@gmail.com; dmarc=pass (policy=none) header.from=gmail.com Received: by mail-qv1-f53.google.com with SMTP id 6a1803df08f44-6bf747371ecso3883766d6.3 for ; Thu, 22 Aug 2024 07:23:23 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1724336603; x=1724941403; darn=kvack.org; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=cMaAuY7A8lo8j+oYYWlt54g77ZLGhGOJF8yf2vy/BSs=; b=aNn+N08P/FZA6pGkieEOtdECOrziNdYUrmjyynSv43Q8xnJVCi4T8W7KBKUokT6QC7 cQCjhunPYO9oIzVD6g2G6p94WIAHwzh1RDkZqEtg/NFLrbxgxe+CDF/hFIsgWmxyxu+l YSGXzR/cdcWlTlJx72cvGRwzU641FFOHFegE64dV2pg5uZiVZLhfRQq7ABJQQMZkwQwP XzTrr6a4ddpAtcVwcQ/6qRytNKvyEbOuu+XCWM1zGf1j7SXv8dVQWVpuofSoVPrMgzrf GPJZGM8JZ6jH3cl1tIBelfb9+N1nU2Q0HE4CfzdmElD8EUZ1nlw2qGGuk4RdhlrFS4OE K8dA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1724336603; x=1724941403; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=cMaAuY7A8lo8j+oYYWlt54g77ZLGhGOJF8yf2vy/BSs=; b=qzhgrh4oPFkqntOsT7Mh2qrzTsRCKk4vUNH4Ifyn4df/tfrl6vxjNNkottRhy8cs7R 6P/61FjCAUOJgiOUdhBVxwzhOHGbWP+Is4VUq4qeBYQcCbkjbdsDsvENIEH7YJ0QIsM6 o94amU5zDKh8BKgiutBWtwuL1rPjtj/T04/xKpPpxW+Yvpn0sR0h15AmCfLIYOuqQq2G FL2qy5uhQ4dsdsRCZTxVZM87GZqV1JRr/OSv9gtD3mXEn6tHEAdg5hFi6sYDCB+clPTA h73OxAttkOCpKij+JfGnTDwMbi5FpdLOCTYQrKCUY+g91Hr6eI024HmW6bJ98ryL45jX oaJg== X-Forwarded-Encrypted: i=1; AJvYcCX3Re6B4CNyh+Lz0zQPDOFrnLLrLjNBIC1DrN9AGIhF39cY64Ce2xMi5SgI7JEx1xzphZYafeoLLA==@kvack.org X-Gm-Message-State: AOJu0YzjL5jOtsONiRXrnmdIzgu1iqvQg9BHYWsH7lTMnZE+IthSxzxI QJsEBAws1MgHVzhM72PMhHROIAuOPOzpnO1dIJ1rI8jTTrWMfnph3CfftOYv0eyNYUUAzGpURoB UKfdvxJCVgZYYAfFgQhO0YspRkqA= X-Google-Smtp-Source: AGHT+IGgQfgMY3VjBOZ36Vc/z+czott+LzOIrgYtAtIz7ThXSLYFEGgj+qU8JX/hdSDBMQCuyQKUQqtAYqiFBPXsD4w= X-Received: by 2002:a05:6214:5988:b0:6bf:745e:d48e with SMTP id 6a1803df08f44-6c15688f48fmr76019766d6.51.1724336602804; Thu, 22 Aug 2024 07:23:22 -0700 (PDT) MIME-Version: 1.0 References: <20240817062449.21164-1-21cnbao@gmail.com> <7050deab-e99c-4c83-b7b9-b5dad42f4e95@redhat.com> In-Reply-To: From: Yafang Shao Date: Thu, 22 Aug 2024 22:22:44 +0800 Message-ID: Subject: Re: [PATCH v3 0/4] mm: clarify nofail memory allocation To: Barry Song <21cnbao@gmail.com> Cc: Linus Torvalds , David Hildenbrand , akpm@linux-foundation.org, linux-mm@kvack.org, 42.hyeyoo@gmail.com, cl@linux.com, hailong.liu@oppo.com, hch@infradead.org, iamjoonsoo.kim@lge.com, mhocko@suse.com, penberg@kernel.org, rientjes@google.com, roman.gushchin@linux.dev, urezki@gmail.com, v-songbaohua@oppo.com, vbabka@suse.cz, virtualization@lists.linux.dev Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Rspam-User: X-Rspamd-Server: rspam04 X-Rspamd-Queue-Id: D663DC0014 X-Stat-Signature: w1har4puwcui48hgntop88a43fy4fjb5 X-HE-Tag: 1724336603-908856 X-HE-Meta: U2FsdGVkX18iVnH5nHAy7sTd5MmW2geVNQkZusZ0kzvuIlKNk4JmIb4vIhbHShSlvFO9nqhNQSltHSbtvXujIg+OoSPZIMyvbJrZKqGkcY5CDNF1RdQ6ekr0PfmVy/l/XqHeZ9iDX4ufFVdT25xMVuqnVRdXUGe5r4inqwEFY+iXPeHo06l2NcZn+1dCNBAsBvSvmHbdcBYi+DsAqO9sR19kJpEMGW/03qxLUVeCL3tL9eWEuLj8V1Yug9hIFRJMOwJ54Ih0pRkjaWIAO21t1jbD59g8iU8XbwTEyDdcyq6bynI3LN+YeUYEemNfSafXyu1LhxWDSehH4ekSVHOyUn1UDJ92LZ9Qmj7ptOEqUkBSqXbfCoq2B9S21DBP8+Xb2/DWenLhMLjHHlQOAbfcZPL1xAfE3JY+89eQudXEenNlDEioakAgTwD5MLQAQZHFGr/ehmXIqDAZOSY5QAQ+P8APIPjYpQUxqZoUMGJEQCSvMUwN4CSHbFla/FQocQ2Mct0LkPmxJJwq8szHHdL+6oCy0ycorSJTxz56NJtu8tFWsRqwgy5k+Mu8ZZ9dvnkGQL7aoaclfM89uqf5ku2AacV0IdEXIE9qQVFp8RmPyGM+u02oT4luU+djnh1Xo0xaqXN5jrlO37afhQdMArWU6ADISKy0kAiKD7xbx4uDuC44d3yqNO7P/+CaQrRAxSxhOm8AHt8FUW62sewxaLTrut+3WWyBjly+c4Q2Sttd5/ivUzeOTe+A0AlpvnqHzT4pObwHX1asyq9Bs0FSF4fFGiGzQnFXDBOLrwn632CoCLx6BPfX70S4NwcmXMeNjnYJM9pE1PDg+U6vo/2r3eWCZJBElI6ifEkretfQ8eK5d+o+G4mCIgqko7KnEEcty6LbIYqcUcqRuFgLc2VYDJbrCJRVM5H1Gi4jE9YdwUZ8/fU88i/dl0T0M34eOU0fIYNONZ+qUAYsgdfAL7+nyKO PTJdrnRR bjkS58rwswOXLx7gc2/TY9kX6g29NnT9KYgnvsLZS+yR+NCj/1KtuXoBNwtodIaF/APhYPkGgfnsLbgpoD/gTAjgSBCgYQ/XRklZi4ePmf2UNB+wr97xisWz61HWF+qsS2ynDVNWW4/gG/E8pEk1UxbHMWKEHMzhdonbPc/2xAU8wTgzzlv5NngVshNIM7G9aSbYlw4Xjrn3ryTTj3nOAuxIfGcpLuXCZs0dg8LVBVUDViFzwZtxOOySnbRcveHLuiSvcrHKK5bKI9OznMjaSKqfYatw52pVBnH7Dc/ZprtmGOp9pkZYg85KvZdf3qIBqH4rQYewR0V3NMVmo35xUprp1KSxyNIL70W4EqEMa7mXAE55RkLgTXLXflXonMA2EUpJi30s6KGVQUtA+C4iBKRUD5DoQag6uzP9rqAC23qHmyhrRSXV6EHGY5jQO2mGUGgNUiRyQPcX46flzhcbdEpqoD6t9cNvXaWJ0Io4g/CTrWXcasmxsTcpEfB+q5zAN35aZ X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On Thu, Aug 22, 2024 at 2:37=E2=80=AFPM Barry Song <21cnbao@gmail.com> wrot= e: > > On Thu, Aug 22, 2024 at 12:41=E2=80=AFAM Yafang Shao wrote: > > > > On Tue, Aug 20, 2024 at 12:05=E2=80=AFAM Linus Torvalds > > wrote: > > > > > > On Mon, 19 Aug 2024 at 06:02, David Hildenbrand wr= ote: > > > > > > > > > > If we must still fail a nofail allocation, we should trigger a BU= G rather > > > > > than exposing NULL dereferences to callers who do not check the r= eturn > > > > > value. > > > > > > > > I am not convinced that BUG_ON is the right tool here to save the w= orld, > > > > but I see how we arrived here. > > > > > > I think the thing to do is to just add a > > > > > > WARN_ON_ONCE((flags & __GFP_NOFAIL) && bad_nofail_alloc(oder, fl= ags)); > > > > > > or similar, where that bad_nofail_alloc() checks that the allocation > > > order is small and that the flags are sane for a NOFAIL allocation. > > > > > > Because no, BUG_ON() is *never* the answer. The answer is to make sur= e > > > nobody ever sets NOFAIL in situations where the allocation can fail > > > and there is no way forward. > > > > > > A BUG_ON() will quite likely just make things worse. You're better of= f > > > with a WARN_ON() and letting the caller just oops. > > > > > > Honestly, I'm perfectly fine with just removing that stupid useless > > > flag entirely. The flag goes back to 2003 and was introduced in > > > 2.5.69, and was meant to be for very particular uses that otherwise > > > just looped waiting for memory. > > > > > > Back in 2.5.69, there was exactly one user: the jbd journal code, tha= t > > > did a buffer head allocation with GFP_NOFAIL. By 2.6.0 that had > > > expanded by another user in XFS, and even that one had a comment > > > saying that it needed to be narrowed down. And in fact, by the 2.6.12 > > > release, that XFS use had been removed, but the jbd journal had grown > > > another jbd_kmalloc case for transaction data. So at the beginning of > > > the git archives, we had exactly *one* user (with two places). > > > > > > *THAT* is the kind of use that the flag was meant for: small > > > allocations required to make forward progress in writeout during > > > memory pressure. > > > > > > It has then expanded and is now a problem. The cases using GFP_NOFAIL > > > for things like vmalloc() - which is by definition not a small > > > allocation - should be just removed as outright bugs. > > > > One potential approach could be to rename GFP_NOFAIL to > > GFP_NOFAIL_FOR_SMALL_ALLOC, specifically for smaller allocations, and > > to clear this flag for larger allocations. However, the challenge lies > > in determining what constitutes a 'small' allocation. > > I'm not entirely sure if our concern is with higher order or larger size. I believe both should be considered. Since the higher-order task might be easier to address, starting with that seems like the more straightforward approach. > Higher > order might pose a problem, but larger size(not too large) isn't > always an issue. > Allocating 100 * 4KiB pages is possibly easier than allocating a single > 128KB folio. > > Are we trying to limit the physical size or the physical order? If the co= ncern > is order, vmalloc manages __GFP_NOFAIL by mapping order-0 pages. If the > concern is higher order, this sounds reasonable. but it seems the buddy > system already has code to trigger a warning even for order > 1: To avoid potential livelock, it may be wise to drop this flag for higher-order allocations as well. Following Linus's suggestion, we could start by removing it for "> PAGE_ALLOC_COSTLY_ORDER". > > struct page *rmqueue(struct zone *preferred_zone, > struct zone *zone, unsigned int order, > gfp_t gfp_flags, unsigned int alloc_flags, > int migratetype) > { > struct page *page; > > /* > * We most definitely don't want callers attempting to > * allocate greater than order-1 page units with __GFP_NOFAIL. > */ > WARN_ON_ONCE((gfp_flags & __GFP_NOFAIL) && (order > 1)); This line was added by Michal in commit 0f352e5392c8 ("mm: remove __GFP_NOFAIL is deprecated comment"), but it appears that Michal has since reconsidered his stance. ;) -- Regards Yafang