From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 90E9AC531DF for ; Thu, 22 Aug 2024 06:37:25 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 22332940017; Thu, 22 Aug 2024 02:37:25 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 1ACB094000B; Thu, 22 Aug 2024 02:37:25 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 026CB940017; Thu, 22 Aug 2024 02:37:24 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0016.hostedemail.com [216.40.44.16]) by kanga.kvack.org (Postfix) with ESMTP id D329894000B for ; Thu, 22 Aug 2024 02:37:24 -0400 (EDT) Received: from smtpin04.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay09.hostedemail.com (Postfix) with ESMTP id 817F5804C9 for ; Thu, 22 Aug 2024 06:37:24 +0000 (UTC) X-FDA: 82478924808.04.E8D5FEF Received: from mail-ua1-f46.google.com (mail-ua1-f46.google.com [209.85.222.46]) by imf10.hostedemail.com (Postfix) with ESMTP id B68CCC001F for ; Thu, 22 Aug 2024 06:37:22 +0000 (UTC) Authentication-Results: imf10.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b=fV0A2FPM; spf=pass (imf10.hostedemail.com: domain of 21cnbao@gmail.com designates 209.85.222.46 as permitted sender) smtp.mailfrom=21cnbao@gmail.com; dmarc=pass (policy=none) header.from=gmail.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1724308561; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=1AuSCScEbylWk/6QSNhda4EgGXfhf6ylHfAJ9C6jPnc=; b=plTyZtH2LwlDqmTpjswwGkYJjk1K1Bzap4tMb3t4rWmeYFDgmI2C1zvAhFyG6FfjXyy8ND 7nhW9AKtlm1eu+DvH06wML5vbVwFq4kJxEy5lrUCuDOyeNU9E9QPy8NqYqxvqdHn/f9HEn WBffHPMwMkNB5Nb42/bJIArVnzGONi8= ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1724308561; a=rsa-sha256; cv=none; b=C8mQsxjUCE5+yf8I4VqqdAa9aE2uzI9S//9mt3cORvZuWVtnovR9hP1w+l+OxJddnTJkBy HwK629NjeKKEhbfY37lE508+q5RWB/A3bc8MO2cbSXPLw2rxizDVP7YTxvsdraXhe1ht0V QDGwmUhGrKKyopinlAp1H9CmXCBO/Ik= ARC-Authentication-Results: i=1; imf10.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b=fV0A2FPM; spf=pass (imf10.hostedemail.com: domain of 21cnbao@gmail.com designates 209.85.222.46 as permitted sender) smtp.mailfrom=21cnbao@gmail.com; dmarc=pass (policy=none) header.from=gmail.com Received: by mail-ua1-f46.google.com with SMTP id a1e0cc1a2514c-8431330e82eso181741241.3 for ; Wed, 21 Aug 2024 23:37:22 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1724308642; x=1724913442; darn=kvack.org; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=1AuSCScEbylWk/6QSNhda4EgGXfhf6ylHfAJ9C6jPnc=; b=fV0A2FPMQ2Hmio2SJE9iEUHjhXwCl71SvT7+oQBUyVR+bxqWBKDxhB7hjPIpo4R0kQ qJeByQpZ9m0DLKao1CbbuO2904eVL2lmytysdioElZzoO410/gQaiDrIihe1s0PdvYr2 /AlFLJGyD0pUhIP6p4SLkSfyxZWa45cI0iRZoDOtycCxF2oEb3snhC3lXUAuO9abJWgF XlJGG6C/peOnotViRg/WUX4UxaPIXKaw0HTabMYqNXxyFOQW4aV7GwMoUqiUTf8AeXFX tSgEut+GCRo34NAZinYri4CVp38RWg+A/W4Wo2Su3V2N9RMkyNxOzq//XYYtYlV1pM8p HRFg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1724308642; x=1724913442; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=1AuSCScEbylWk/6QSNhda4EgGXfhf6ylHfAJ9C6jPnc=; b=tN9viYyFRrJ+ZwElkAVqpUdHUTzPDCbD7IQyUKWknLR9RixUwbjV7ZQEhGkvcM7twy oCQG3DxwEoEXs/QWffrknVlvmF2kkoP/Oz3Vx+TAondpeQreWoghbhdBi54imLbSqzNw UfCo9c8h/rP1ZsxrcVjKN7IuQSUxGR1a+x6T4qYg/YlDP3z0VpWK0dOu4NSh2C2A9CKL zV2gJhMGkcyx5ospgdzp9kK4sv9sGwRSxR8yLZ+N4wFcqDrdpDCO0lG37d3RntdpvK66 AJ8+pIt3nAg1ktf7IL063om9QBDNq8WxymCkH2Act4UlkXIVNm3uqgxjgsywIZXKt8XW xv1g== X-Forwarded-Encrypted: i=1; AJvYcCU+G01LONCVzAjOgH8f3S7pxMwU608QO9YkkIuhU3AKEgXZ6wVkqeG/EeBw9lgt4IU+eaBBzW2r8w==@kvack.org X-Gm-Message-State: AOJu0YzgPOn9nNX2I4jfntLXZOkXDeYGPENO6tvEH7En5FR46iNQkzQQ tuZfVyWKBaMLrQdGNSD5Av7lNBIzMknOGE2lYu5Z7/RoV0uH7ZD90j07d0Vl54pYrcuhrJbk8K/ XwDwr/gPG0C+BF6RiwIf38YI7rO0= X-Google-Smtp-Source: AGHT+IFIQ1RumAQJ42/U2rDSoTKvx25U4gHqv/JJFxcqywi9lEKn/prxBiaXWNwAVRltwlGJVJWpFdew9Ort2xw3K8A= X-Received: by 2002:a05:6102:6cb:b0:497:70db:ffcf with SMTP id ada2fe7eead31-498e724ad59mr776170137.19.1724308641592; Wed, 21 Aug 2024 23:37:21 -0700 (PDT) MIME-Version: 1.0 References: <20240817062449.21164-1-21cnbao@gmail.com> <7050deab-e99c-4c83-b7b9-b5dad42f4e95@redhat.com> In-Reply-To: From: Barry Song <21cnbao@gmail.com> Date: Thu, 22 Aug 2024 18:37:10 +1200 Message-ID: Subject: Re: [PATCH v3 0/4] mm: clarify nofail memory allocation To: Yafang Shao Cc: Linus Torvalds , David Hildenbrand , akpm@linux-foundation.org, linux-mm@kvack.org, 42.hyeyoo@gmail.com, cl@linux.com, hailong.liu@oppo.com, hch@infradead.org, iamjoonsoo.kim@lge.com, mhocko@suse.com, penberg@kernel.org, rientjes@google.com, roman.gushchin@linux.dev, urezki@gmail.com, v-songbaohua@oppo.com, vbabka@suse.cz, virtualization@lists.linux.dev Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Stat-Signature: xgshq5cjoqcoy6zt84y6him1s6ie79k5 X-Rspamd-Queue-Id: B68CCC001F X-Rspam-User: X-Rspamd-Server: rspam08 X-HE-Tag: 1724308642-614118 X-HE-Meta: U2FsdGVkX18gEnsux9hTpnO66oAqtK5OkeUuWLPWb2BuXaFee+ELXLJtBuxeGvGV8zq8TFr2nRq62KpH+D/8RJdKYmvmgfqtgkyeWFQckfpAjNGBVioTrD3jmBcUd9orhWwlr66FeT1tGSsCg2Z48It7pM+ioLbE694FJdZ+CZgWskC3/gEOCFeRP9oexAp3BIUGhLClu9HlEQ0zaHgSSXkJCcr39Bs2z6HTmN5dobF5sFqM9iOn2gMf772ikMdSCtDq4uxMFPACDY16XXkg6GwwhwbOJPFfPUaWyZ89mcikkxF0PDkdvKx5gvC2b4O56HjvQCZe172TOUudogS5NaGDtl/Xq4ONsAot0jIHRmOM7TIYNZay6HmG8GzK2hONuCeJftWfX/u+Ycc8zhcFfo1yZSCZBJXLBq/XgUAVdCqWpC7YCix15O0ATL0ILpZet0/1taPOfbmDkjHUIT1+Ja1NWnGau8InERuP/ICoTtuoLHC+lSULTsQJjMDm1oZEr4b7VvPX6vYsg8YCAtnps3shlbwqk1Jt7C5iqpgvr+PaiGQdNqogrtb+D2i1a0W8oG3CyRVpc7HVQ0Ypm6ntFjcWc40uNBCdBZIJ20Phv+HdrA7yDiveRth+oE8fxalk5DBWA+2K5cSC4zbNMjsmo+AhfqTR/z5EX+1nsL4n552/ntOxTDF7qEEMVqh7voaqKxp7iKnbrvMMCprN2dmLDzllBcWK9JSkZY/W+n4A64iYObVsAd6Mc69Mf5Q8V26s+nxNhNt7XIHfzivrs3Q7Bb1rE/0npe2k9WWrI8StvS9ja2rQbmW5Go5NWxcjNca86oGX+nQYF24qXuYTNYz8PNNKHuRhn8pTSLEEBp6Lfb9qGEgqzlJL4gKQUAzKvlHsSl9QgylIWvUT3sKN9Y/ixzA2J/OefrfYaI6VYGzQs0oOH6bKDbM21oj5EpeUkRCwN5fgaFt4f0reqqR92ra XgJZ6rH5 x4d+cyDQgJ9lyDV0YbgtIoonfbxZlhukCtarhwlWy/VWWd7FbGGlC4GKGoC8t0p0jiC4bTrYrA39nE9fog5ZYY8ksQquggQmVdoNXJmUjTv0lzQ8PDUbrsl4iscJvYEve4jUjNu0asLVp0uoPGMH+inPlQsQpjB260tT/1cg6C/NZ9OyF8g1Il3ItzREmo5f/YK6fi719SLrIf0tlUkz6rCXsoKgMhCjuozBiuTYQ3ajHINR2zLMei7T6fyrWwoJpVS2FZzcCL8Ks4tDOWit1Plz7L5FjENkDvXEO3S2yqrK+Ft+a2zT8D9OgE2nrBSC9suznvo0rLBNzgoyK3HNp/1j818zMbuXvA2beyD6W/3U3IQUjrKpz6YWAmxKFrP2VbRbZ3egN4UP8gUKa4Uq2JE1sDCogQssCiVGZgsLqXCGcFDhEs1sWQEo3JG6A1e0FrvtxpYzyUPmxdF1v8t90iSO8W0EZUptT2RgyfjvckE/nQtBcAXenffNfHXywgLuUpUIu5Ne2+Uqa0lCbguKf9IEj9w== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On Thu, Aug 22, 2024 at 12:41=E2=80=AFAM Yafang Shao = wrote: > > On Tue, Aug 20, 2024 at 12:05=E2=80=AFAM Linus Torvalds > wrote: > > > > On Mon, 19 Aug 2024 at 06:02, David Hildenbrand wrot= e: > > > > > > > > If we must still fail a nofail allocation, we should trigger a BUG = rather > > > > than exposing NULL dereferences to callers who do not check the ret= urn > > > > value. > > > > > > I am not convinced that BUG_ON is the right tool here to save the wor= ld, > > > but I see how we arrived here. > > > > I think the thing to do is to just add a > > > > WARN_ON_ONCE((flags & __GFP_NOFAIL) && bad_nofail_alloc(oder, flag= s)); > > > > or similar, where that bad_nofail_alloc() checks that the allocation > > order is small and that the flags are sane for a NOFAIL allocation. > > > > Because no, BUG_ON() is *never* the answer. The answer is to make sure > > nobody ever sets NOFAIL in situations where the allocation can fail > > and there is no way forward. > > > > A BUG_ON() will quite likely just make things worse. You're better off > > with a WARN_ON() and letting the caller just oops. > > > > Honestly, I'm perfectly fine with just removing that stupid useless > > flag entirely. The flag goes back to 2003 and was introduced in > > 2.5.69, and was meant to be for very particular uses that otherwise > > just looped waiting for memory. > > > > Back in 2.5.69, there was exactly one user: the jbd journal code, that > > did a buffer head allocation with GFP_NOFAIL. By 2.6.0 that had > > expanded by another user in XFS, and even that one had a comment > > saying that it needed to be narrowed down. And in fact, by the 2.6.12 > > release, that XFS use had been removed, but the jbd journal had grown > > another jbd_kmalloc case for transaction data. So at the beginning of > > the git archives, we had exactly *one* user (with two places). > > > > *THAT* is the kind of use that the flag was meant for: small > > allocations required to make forward progress in writeout during > > memory pressure. > > > > It has then expanded and is now a problem. The cases using GFP_NOFAIL > > for things like vmalloc() - which is by definition not a small > > allocation - should be just removed as outright bugs. > > One potential approach could be to rename GFP_NOFAIL to > GFP_NOFAIL_FOR_SMALL_ALLOC, specifically for smaller allocations, and > to clear this flag for larger allocations. However, the challenge lies > in determining what constitutes a 'small' allocation. I'm not entirely sure if our concern is with higher order or larger size. H= igher order might pose a problem, but larger size(not too large) isn't always an issue. Allocating 100 * 4KiB pages is possibly easier than allocating a single 128KB folio. Are we trying to limit the physical size or the physical order? If the conc= ern is order, vmalloc manages __GFP_NOFAIL by mapping order-0 pages. If the concern is higher order, this sounds reasonable. but it seems the buddy system already has code to trigger a warning even for order > 1: struct page *rmqueue(struct zone *preferred_zone, struct zone *zone, unsigned int order, gfp_t gfp_flags, unsigned int alloc_flags, int migratetype) { struct page *page; /* * We most definitely don't want callers attempting to * allocate greater than order-1 page units with __GFP_NOFAIL. */ WARN_ON_ONCE((gfp_flags & __GFP_NOFAIL) && (order > 1)); if (likely(pcp_allowed_order(order))) { page =3D rmqueue_pcplist(preferred_zone, zone, order, migratetype, alloc_flags); if (likely(page)) goto out; } .... } > > > > > Note that we had this comment back in 2010: > > > > * __GFP_NOFAIL: The VM implementation _must_ retry infinitely: the cal= ler > > * cannot handle allocation failures. This modifier is deprecated and = no new > > * users should be added. > > > > and then it was softened in 2015 to the current > > > > * __GFP_NOFAIL: The VM implementation _must_ retry infinitely: the cal= ler > > * cannot handle allocation failures. New users should be evaluated car= efully > > ... > > > > and clearly that "evaluated carefully" actually never happened, so the > > new comment is just garbage. > > > > I wonder how many modern users of GFP_NOFAIL are simply due to > > over-eager allocation failure injection testing, and then people added > > GFP_NOFAIL just because it shut up the mindless random allocation > > failures. > > > > I mean, we have a __GFP_NOFAIL in rhashtable_init() - which can > > actually return an error just fine, but there was this crazy worry > > about the IPC layer initialization failing: > > > > https://lore.kernel.org/all/20180523172500.anfvmjtumww65ief@linux-n8= 05/ > > > > Things like that, where people just added mindless "theoretical > > concerns" issues, or possibly had some error injection module that > > inserted impossible failures. > > > > I do NOT want those things to become BUG_ON()'s. It's better to just > > return NULL with a "bogus GFP_NOFAIL" warning, and have the oops > > happen in the actual bad place that did an invalid allocation. > > > > Because the blame should go *there*, and it should not even remotely > > look like "oh, the MM code failed". No. The caller was garbage. > > > > So no. No MM BUG_ON code. > > > > Linus > > > > > -- > Regards > Yafang Thanks Barry