From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id DAF1ACA101E for ; Sun, 1 Sep 2024 03:35:37 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 19C7F8D0038; Sat, 31 Aug 2024 23:35:37 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 14D4C8D002D; Sat, 31 Aug 2024 23:35:37 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 014F78D0038; Sat, 31 Aug 2024 23:35:36 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0010.hostedemail.com [216.40.44.10]) by kanga.kvack.org (Postfix) with ESMTP id D81258D002D for ; Sat, 31 Aug 2024 23:35:36 -0400 (EDT) Received: from smtpin23.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay01.hostedemail.com (Postfix) with ESMTP id 2084C1C5189 for ; Sun, 1 Sep 2024 03:35:36 +0000 (UTC) X-FDA: 82514754672.23.739C018 Received: from mail-ot1-f47.google.com (mail-ot1-f47.google.com [209.85.210.47]) by imf17.hostedemail.com (Postfix) with ESMTP id 0723140008 for ; Sun, 1 Sep 2024 03:35:33 +0000 (UTC) Authentication-Results: imf17.hostedemail.com; dkim=pass header.d=fromorbit-com.20230601.gappssmtp.com header.s=20230601 header.b=nnoqGQNj; dmarc=pass (policy=quarantine) header.from=fromorbit.com; spf=pass (imf17.hostedemail.com: domain of david@fromorbit.com designates 209.85.210.47 as permitted sender) smtp.mailfrom=david@fromorbit.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1725161712; a=rsa-sha256; cv=none; b=RzfgI/SkROUGd4zoH8dcMFdXwlK2B6buQmQlQN6ZnCqPYfXSBvZV2j+csYbYAZMCO+RSKw RFbPWxOzOsZDau7kNWCeljXsXjS/EAgLxVYa0pF78UDztbzu1QNrQW4EV9bOMOjVe/0YQu FY8kyj3/PCaZTfWre8H5Ftwsx12dRaM= ARC-Authentication-Results: i=1; imf17.hostedemail.com; dkim=pass header.d=fromorbit-com.20230601.gappssmtp.com header.s=20230601 header.b=nnoqGQNj; dmarc=pass (policy=quarantine) header.from=fromorbit.com; spf=pass (imf17.hostedemail.com: domain of david@fromorbit.com designates 209.85.210.47 as permitted sender) smtp.mailfrom=david@fromorbit.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1725161712; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=w16C/dsCRPyIteVCdQ7/x1WEr4oeyIf5x83o5J8691E=; b=rQXEBFen8MOireWOgd+RTNK1MoVgpKlKfmZiFAfKIaXMyHUJkFyen3eFuCPpB//8BaRhB5 df4SOf+M+zrKW0us3r8nmBKftfMqPdi+L0Nx+VeiFztEFKuhsdf2ZIcNJcmItlTczHZ7R2 pkQDvcs+fkxmxzIS6r0zOlc+AX4+Uic= Received: by mail-ot1-f47.google.com with SMTP id 46e09a7af769-709339c91f9so1741331a34.0 for ; Sat, 31 Aug 2024 20:35:33 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=fromorbit-com.20230601.gappssmtp.com; s=20230601; t=1725161733; x=1725766533; darn=kvack.org; h=in-reply-to:content-transfer-encoding:content-disposition :mime-version:references:message-id:subject:cc:to:from:date:from:to :cc:subject:date:message-id:reply-to; bh=w16C/dsCRPyIteVCdQ7/x1WEr4oeyIf5x83o5J8691E=; b=nnoqGQNjaX/aURrFTVtSk54vgSQ2QBSKq0brfpDJMzITlJc36tas8m+SMKcPQgE957 Pote4XZ+S+HWE4DGRNCQdj3iM0FIuHO7kOsbzjts9a5Q4nVBqXL20r4ETyRF3BjQsY8G vL821uqa0xUxnlp+6voWE5hiJ3pWP0K7HODzXu2hfcngc3LJHqj6HnksHSkcMobu4cdc Jv41zDvdo+lMVDgpL86KfrzdIhyX7WtR3NNLv2VimDqdL781paNqS8Al3d0hCmB/jln+ sXI89gm9oM11r3Joc9aPrpl0ZLPT53L9wKgUqaWiyzvulm3zNYmDiXAy5s4LxVzOgz04 un/w== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1725161733; x=1725766533; h=in-reply-to:content-transfer-encoding:content-disposition :mime-version:references:message-id:subject:cc:to:from:date :x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=w16C/dsCRPyIteVCdQ7/x1WEr4oeyIf5x83o5J8691E=; b=WDLbyZykCH49Nc5ufVyTbhF1uR/wCZokhn2OPxVf2Wh1RedLCzpS0SV2lJhj9k+C63 cQ7433xMOuVtReAyoTpCvI1liKD5BkYz2QaTNZmKJfNrxqh8L2pG26LP/5O8Q/NRu8zE aHu3xhwyqAxKnQrOAt1TTYw3JXKhTmJBnA/e/WIGHaUTo/+g6Ik9HfQPRh2WRs2jO/XF ZAWHl3yrH6yVA9FciZqX8h6WJ5bgwWQfATYlb6hVBBynGiUwEuqReKLm9cRNKUhndZx8 eRiiUdXrGR5Kah3WgfiOSeMK7x3Xq+VfjV+meGui0BSUWcqic1IpTqcnTEsXSSqOfOFD bMUQ== X-Forwarded-Encrypted: i=1; AJvYcCUWL6u3YGtM8coj+ZMmd5RzIsBmu130FqszdrLtpnoSkwqsYSEJBBNL4oN8/g8+af8uuHheE1pL5Q==@kvack.org X-Gm-Message-State: AOJu0YxDxvmMhfcZQHxUnySDB1SNaBoKdkfdfPU+XXf5mtpUKCW3Xfi8 delx3v5A217Q6LjJkZYBl9dU/7UyhGGDYuq8tSzx5XP+/C3LPpvx069Une+OXJc= X-Google-Smtp-Source: AGHT+IEROnVTbJpSKcYM2An6IRLRNqxVW8qijP9SjeKOdQlOe3qMET8T4FdD9PUv71Bc9UcQL0ol5A== X-Received: by 2002:a05:6830:6883:b0:709:4552:1f70 with SMTP id 46e09a7af769-70f5c3e68f5mr12020746a34.24.1725161732882; Sat, 31 Aug 2024 20:35:32 -0700 (PDT) Received: from dread.disaster.area (pa49-179-0-65.pa.nsw.optusnet.com.au. [49.179.0.65]) by smtp.gmail.com with ESMTPSA id 98e67ed59e1d1-2d8445d5ba3sm9000029a91.11.2024.08.31.20.35.32 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Sat, 31 Aug 2024 20:35:32 -0700 (PDT) Received: from dave by dread.disaster.area with local (Exim 4.96) (envelope-from ) id 1skbNB-002Qac-2a; Sun, 01 Sep 2024 13:35:29 +1000 Date: Sun, 1 Sep 2024 13:35:29 +1000 From: Dave Chinner To: Yafang Shao Cc: Kent Overstreet , Michal Hocko , Matthew Wilcox , linux-fsdevel@vger.kernel.org, linux-mm@kvack.org, linux-kernel@vger.kernel.org, Dave Chinner Subject: Re: [PATCH] bcachefs: Switch to memalloc_flags_do() for vmalloc allocations Message-ID: References: <20240828140638.3204253-1-kent.overstreet@linux.dev> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Disposition: inline Content-Transfer-Encoding: 8bit In-Reply-To: X-Rspam-User: X-Rspamd-Queue-Id: 0723140008 X-Rspamd-Server: rspam01 X-Stat-Signature: zxo3o1n6aadwu9bmfo1u35pqgicjnu1p X-HE-Tag: 1725161733-48755 X-HE-Meta: U2FsdGVkX18JKvQ3ZcZ4vFnB07w6Z8C/dG3TYm7fklN51oLJjdqbferzDVQjouWf1XqqbiysUMvA4PDdodjAXyW6j2TIt6FcVJ0DD7zFbzM3PAu4rLlsLgxXzsWAoslxkDudmMm4jnG9wD3zw7/OjG2ZXw1KJhVRL36M5uCmopmTPZj/LKPCO8WGoX67nik0NG1XCHenEBwvEz3B6UvZNu5S1hpy1iUEDHFnF4njrglzjBcNxck2S/FWoX2QKpx5k0kylS/AlmDKqS33MGXrb4sVUOOnnhrULVCOXTWFY1hoSBni5t3TuSahV7XftKd0oFdY5FBh3sOY/4esKy3MPJw9i+7ViAlEcUo82UpXC11MgVgWUukxUKvk4UgXKWLhK8Eu6M1JlcbYT41CqlOJnXidwfsep1R+aDxFS6ZsJYJmKThp6+UJtmg6vcq2tXdwbUn+Bp0P0Cysk7I/WyuRygatDl3eCUmndKo/TmMPjseWkKh0rUtjtdr7S4mzbqmcH1MrWyitwhfVbFhYgIY6ze1CwtcobQGVREjOarjBISMMpWD3jfqYnicGdudnAs4Pjg67/OQ4MEpg8nLu5dLjPhBHZblF98uYilwUbtlblEosn11mI7LiopPuv1QF10GsXlQiMTlws8u/Gur1foIX9cLDBx85XXRAeLVRdxdI3LnBTTPlkXiK87B7KuR1TvQ68mpAAGLRsiBnp6BTTTR5xShKsKqKNb1ShKRvFZ/DfV4B1cDqGk832kfOaiSlZs1Kpmbd21HJdzhIKL1JjPR/sPIu1rMy7fxFJ0NFJmL8LKV5jYzMfq7hUtwlq+yjPsHfJs7M51iVzpdw3BkQrH6GNzhAPsGIJOZZkrbBnPW8UnGzJ8w5znZvCAMI4vpRCZBPx4j7Hvyf+JNC2OmwwyV3WPmRqkpLNb5/btnPNOdcTVhyPaACdabYureuXINKiuijWnYGRB2mxvS0l1seKaL /DIdgQpv g/d/8hRzumR9t8JUvyrN1eI2tpdZGn419TlKUubK6s2+Ew6CTZD11urSfjmcsr0cAFx4GdhxVrvYJClDaqRyxvXzP8fT6qwkQUJaKrBzuHzmW/Kf6NbvrDgB4D6L4FiRnkd8MPfm5dpP7rSGmuwUgHLdY8x6lJF8NDOuELOXgzcJE7gHDDUaHUrJpvndXFWrwvdcqHkbBfuFcumeIW4hUHptihQLg4/pxRQQIDebkV1T/p/1VFlgDPausVdtNcp7OgqO+K+yYDMHWEoYPvAEozrMqNUk0BSUTDvKjCuwV1l5Jy109h3yDEZfiB166HDh67n47j/YJN9TGiaGCs/bsnlHfES9Q7jwW2Pgw8gQxG92Deg2t8r9Rm+/JRNBe3c42eN4whAH8WIB3e+HD8Ht50VtTLK1thpB23Ahy/5hqOg9jhlpUme6I7LsIaklOultU6QHVjtp1Zyy7yU+vWHaxwvz+ym3sYwrYLrpsUvldIaSCez/bgLOTEFzhnoOITrOKK/qZi+B/FyGPW01+blFhWcV11RHbl+xx773Ao+vt1fbAP9UgZ/J9K/7ICg== X-Bogosity: Ham, tests=bogofilter, spamicity=0.081983, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On Fri, Aug 30, 2024 at 05:14:28PM +0800, Yafang Shao wrote: > On Thu, Aug 29, 2024 at 10:29 PM Dave Chinner wrote: > > > > On Thu, Aug 29, 2024 at 07:55:08AM -0400, Kent Overstreet wrote: > > > Ergo, if you're not absolutely sure that a GFP_NOFAIL use is safe > > > according to call path and allocation size, you still need to be > > > checking for failure - in the same way that you shouldn't be using > > > BUG_ON() if you cannot prove that the condition won't occur in real wold > > > usage. > > > > We've been using __GFP_NOFAIL semantics in XFS heavily for 30 years > > now. This was the default Irix kernel allocator behaviour (it had a > > forwards progress guarantee and would never fail allocation unless > > told it could do so). We've been using the same "guaranteed not to > > fail" semantics on Linux since the original port started 25 years > > ago via open-coded loops. > > > > IOWs, __GFP_NOFAIL semantics have been production tested for a > > couple of decades on Linux via XFS, and nobody here can argue that > > XFS is unreliable or crashes in low memory scenarios. __GFP_NOFAIL > > as it is used by XFS is reliable and lives up to the "will not fail" > > guarantee that it is supposed to have. > > > > Fundamentally, __GFP_NOFAIL came about to replace the callers doing > > > > do { > > p = kmalloc(size); > > while (!p); > > > > so that they blocked until memory allocation succeeded. The call > > sites do not check for failure, because -failure never occurs-. > > > > The MM devs want to have visibility of these allocations - they may > > not like them, but having __GFP_NOFAIL means it's trivial to audit > > all the allocations that use these semantics. IOWs, __GFP_NOFAIL > > was created with an explicit guarantee that it -will not fail- for > > normal allocation contexts so it could replace all the open-coded > > will-not-fail allocation loops.. > > > > Given this guarantee, we recently removed these historic allocation > > wrapper loops from XFS, and replaced them with __GFP_NOFAIL at the > > allocation call sites. There's nearly a hundred memory allocation > > locations in XFS that are tagged with __GFP_NOFAIL. > > > > If we're now going to have the "will not fail" guarantee taken away > > from __GFP_NOFAIL, then we cannot use __GFP_NOFAIL in XFS. Nor can > > it be used anywhere else that a "will not fail" guarantee it > > required. > > > > Put simply: __GFP_NOFAIL will be rendered completely useless if it > > can fail due to external scoped memory allocation contexts. This > > will force us to revert all __GFP_NOFAIL allocations back to > > open-coded will-not-fail loops. > > > > This is not a step forwards for anyone. > > Hello Dave, > > I've noticed that XFS has increasingly replaced kmem_alloc() with > __GFP_NOFAIL. For example, in kernel 4.19.y, there are 0 instances of > __GFP_NOFAIL under fs/xfs, but in kernel 6.1.y, there are 41 > occurrences. In kmem_alloc(), there's an explicit > memalloc_retry_wait() to throttle the allocator under heavy memory > pressure, which aligns with your filesystem design. However, using > __GFP_NOFAIL removes this throttling mechanism, potentially causing > issues when the system is under heavy memory load. I'm concerned that > this shift might not be a beneficial trend. AIUI, the memory allocation looping has back-offs already built in to it when memory reserves are exhausted and/or reclaim is congested. e.g: get_page_from_freelist() (zone below watermark) node_reclaim() __node_reclaim() shrink_node() reclaim_throttle() And the call to recalim_throttle() will do the equivalent of memalloc_retry_wait() (a 2ms sleep). > We have been using XFS for our big data servers for years, and it has > consistently performed well with older kernels like 4.19.y. However, > after upgrading all our servers from 4.19.y to 6.1.y over the past two > years, we have frequently encountered livelock issues caused by memory > exhaustion. To mitigate this, we've had to limit the RSS of > applications, which isn't an ideal solution and represents a worrying > trend. If userspace uses all of memory all the time, then the best the kernel can do is slowly limp along. Preventing userspace from overcommitting memory to the point of OOM is the only way to avoid these "userspace space wants more memory than the machine physically has" sorts of issues. i.e. this is not a problem that the kernel code can solve short of randomly killing userspace applications... -Dave. -- Dave Chinner david@fromorbit.com