From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 71F54CA0EF7 for ; Fri, 30 Aug 2024 09:15:09 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id E2ECD6B00E0; Fri, 30 Aug 2024 05:15:08 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id DB88B6B00E1; Fri, 30 Aug 2024 05:15:08 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id C591C6B00E2; Fri, 30 Aug 2024 05:15:08 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0015.hostedemail.com [216.40.44.15]) by kanga.kvack.org (Postfix) with ESMTP id A4F066B00E0 for ; Fri, 30 Aug 2024 05:15:08 -0400 (EDT) Received: from smtpin16.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay02.hostedemail.com (Postfix) with ESMTP id 1D64612172E for ; Fri, 30 Aug 2024 09:15:08 +0000 (UTC) X-FDA: 82508352696.16.A792B60 Received: from mail-qv1-f41.google.com (mail-qv1-f41.google.com [209.85.219.41]) by imf26.hostedemail.com (Postfix) with ESMTP id 482F2140013 for ; Fri, 30 Aug 2024 09:15:06 +0000 (UTC) Authentication-Results: imf26.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b="IzIMY/hM"; spf=pass (imf26.hostedemail.com: domain of laoar.shao@gmail.com designates 209.85.219.41 as permitted sender) smtp.mailfrom=laoar.shao@gmail.com; dmarc=pass (policy=none) header.from=gmail.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1725009217; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=YMYjxf06/xzrVFnQjfFIBLlkRu6WKAwRmwgAflzV1Nk=; b=dH6HU6lQ2vEaEl6FHb+fXW6kDCyI6c2q3rA1hJvQ6kzDPECBlxa/Zv6us24UrG1OGcKz6G nLVvZ9G0Qiza2dfeA9VS7d4nT3TYAINexeb0v0hQcTvXvruYUajoPLyYw4Q2wbnx5Muo2Z UCWRG3TPKoL7HnQwGq4rATatONcipDo= ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1725009217; a=rsa-sha256; cv=none; b=4d5UACoEyhVwhIIMA1YQ1BgyDM6PEs/QxTeQlHKYhfmJAPBGQFFidgem1yGqnkLt9q2Amt 3SVBHBWdGVlAzOG4TXhZISainhz8quqoxtwmYNOAr36fE9IUpZ6I5x4nivHtVsGE6D4G9m obQtK7+zmtVhkpamg/BAeTV2KAaxFJw= ARC-Authentication-Results: i=1; imf26.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b="IzIMY/hM"; spf=pass (imf26.hostedemail.com: domain of laoar.shao@gmail.com designates 209.85.219.41 as permitted sender) smtp.mailfrom=laoar.shao@gmail.com; dmarc=pass (policy=none) header.from=gmail.com Received: by mail-qv1-f41.google.com with SMTP id 6a1803df08f44-6c159150ff4so8973516d6.2 for ; Fri, 30 Aug 2024 02:15:05 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1725009305; x=1725614105; darn=kvack.org; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=YMYjxf06/xzrVFnQjfFIBLlkRu6WKAwRmwgAflzV1Nk=; b=IzIMY/hMnf3LkPIHib7heYGhxWZ36t28E5ngFPOWHlflGJU19flFZ2ZYbLHea2bR6w PZsVXCa/0C0EPhk1xINKrKjATAorENSeXD7QgrgbZiNo3okHp52GWdy1Tm3dRqH8JORi vgaY1kPJ6h1n6yM8b21ER0uGHSqUbN5mJpP1hLL1dejMiPK766pycpHaIBAwRz2y/LFa K0rePd9ftNkdmI/F5mY2XyenvQgq2Tj7jklxnxJHf5HTeH+JuQYyUduKrl8eBRAWIJ3y hDKxswTuXNKQ1ydttnx4dJXffD3Gvm7fAzXOD4CHm+f1DJAk46i6omsgISATZRjoa5+Z uNHw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1725009305; x=1725614105; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=YMYjxf06/xzrVFnQjfFIBLlkRu6WKAwRmwgAflzV1Nk=; b=ktSM3zLyK1YIlsEO5xbYS/Eh1DHFAPhe0v8mxB8FK+WD3kRNpOYi0jh3KatTIyyzIQ 2t4HmcI9w7yp0qDu4sR8+CRRlkxgUEgyiswN/gP8Rxeu0LUBV8NLvzdBTIZ+Y/fxCftX Dyl2J8r22TGWYjqLdRGsSDkl4uRMR5J8HQy12oI857YoT7WbWtWU94fuoAk1UbCgZy4Z /mBD5IjIsDKv3nQgfIwfUI4DgEelx0meuTeguZu5tMTW7IChmH42ZmFJnETfRCe+7o7f bn0aJFkMUO5wa+BZDvh2kVtNCZXvAB3zjhDLN9lZacMfUL3zXvFgNfMUdFZuwsmxRNPz bhgA== X-Forwarded-Encrypted: i=1; AJvYcCUmI1Y2AiqbSWbwKPBz+8Re0R3Va4qGfhhbtU1Y/OD3NWwLTQr0fD0N2Ujy5NPsVPS4l54zTech3Q==@kvack.org X-Gm-Message-State: AOJu0YzdcouOByIyHtgG4G+RuPion/mDto+20eIu6ZffVEkvaPBDpGxI HEmizSYptdg7X4dXtf7XxcAgi78pJrwD/lMZ9pxYtJmxOJK8jI+lu4mZ3Q2joJDlRTiPEYnLk6D qFnn512++fENid3678/KGEvAELxw= X-Google-Smtp-Source: AGHT+IHPz7RwNRoySa4cVAgVqyXEx7uUIhjwxzkc5uRcUJ/q8SAC8Z4h+mjIUB24TBnUS1LnBmmacRsc+acMxQH2sD0= X-Received: by 2002:a05:6214:4408:b0:6b5:936d:e5e9 with SMTP id 6a1803df08f44-6c33e6254eemr63713016d6.26.1725009305147; Fri, 30 Aug 2024 02:15:05 -0700 (PDT) MIME-Version: 1.0 References: <20240828140638.3204253-1-kent.overstreet@linux.dev> In-Reply-To: From: Yafang Shao Date: Fri, 30 Aug 2024 17:14:28 +0800 Message-ID: Subject: Re: [PATCH] bcachefs: Switch to memalloc_flags_do() for vmalloc allocations To: Dave Chinner Cc: Kent Overstreet , Michal Hocko , Matthew Wilcox , linux-fsdevel@vger.kernel.org, linux-mm@kvack.org, linux-kernel@vger.kernel.org, Dave Chinner Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Rspamd-Queue-Id: 482F2140013 X-Stat-Signature: bqbwffki15tpec6mnf7ufb1rgx6na9c1 X-Rspamd-Server: rspam09 X-Rspam-User: X-HE-Tag: 1725009306-988354 X-HE-Meta: U2FsdGVkX1+YJtbltGVCuWBpbiaouVsEArw0VdUFzdTlAb7xmWC9ycHX8GgtqdRgK/KY7XsL0TM8VAZ5UTNwCMS/E2yc9GKJObDDyoEai1iDb1h0/sttpmiEH+bRj6XNlHId3XopnX4DsvfPM1FIlzNVTkVRC46bQZe9jaJeMoPPBpXU2XbTNWt3Y7f7zAch8qCB06gvoAtpmdDTH0Iqz5T14SA8O3SKRJzCPVJzOXkCdK+7j0sokEgqa+Fz/GNEVKJN2EFzMUquo5HXpkzxhdiM5qpsvXyxnuBMkV/J6VwFeerWssLcmD5oOe5O+0VP9+Qq0RTzAZkXqS7SBztQeUSXZFb3/hKarizczlS98UTEjJzFZGQqPn74KO8dwXUPsQEHHYHsCVQMGw4gpSEX8sknt3lXxkXM6O9FBknZX+XZX2MdHnjt7Sww4MibNc1JFzysVY108mIauuk55LYTNRlffu+qm5ZFYpLwnpkOx8oVpWKHoH2rTfUVqz0L4kQod/dYV0f9ZJ1iItHnsQTqi8isa6TQXW+iXFoIvDJOGNifZbtMm99hBlqLsQyg7Jg1YrCP4IRczLliEu2ojoUI73ku9y/LIS3ohS/p5i8idm7SPn7sQD8K3HpDElt0LJ6nKj81rtQMZcqTirUT3yZsKPWm1rVxrhvpJJLkkJ2ciXmxxSL02k4Zg/TrIIvgdLqrADmnJijrhILxTyySxGqVM9mZYrtLSlwqMrkKpKZGeECGRW/5ZsiwGovY6nr9HOEk+Eku4dzzn9CwICYy8DVetywrDu6kh/6AOm1criAoaAO1y8WkckvM6kxafDDnovfZbOn81e/n9lwNwVL0q2e1fhE2smsLon1WTMonbUhYfEAoYBmx/7Ki+3HWzG0A4ZZeln/2hAK/6d7u3wcG7Nw5k65o7vXvLBeh3jdbAa1W755+JJeSdobPzQvLFhd2V5XKPMgPQOUAix6BKTq887q XpO4N3xU WeEVdLGH8pMHCoWYwwpiNJxa4t31Afhi9M01ILUUMESLp31a3804llVLON72nq63YX6P2F2nze7qZUwXWOl4quWJJOQgqzs4tS1B6nXtQvy/Ps5PpPFr6rSFsjCb0DqYijw9NOf/Q+CXgpooJK18SZIEKhN5B4X7PsJxmgbomQkO0kP4tloC4gZc6iITqkKuXl92J847GwGW9mikKVAhSBqHnQAfYnwdF4t9CckqJO9g6ic6DdOrcn48Lg4QhC3yShfhtfYMyNzsa+o/6Wtzsb3OVhdp0AyyQpHc9Q2QBsscz2AfxQOHc65wzTLtMb6gli/z6XChmNrTutSbvLVtTAxwZ72yWNiClnbXsstWG8GPzYwxwidqVaASPLvwOyu552o7VfmH/bVMsLd3cQuo+jyc+w39vBGTkRzeMSlL7M3C3HG8CmAyqrpDccvBwpbS7QM3jYKITSJGgLcA= X-Bogosity: Ham, tests=bogofilter, spamicity=0.001249, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On Thu, Aug 29, 2024 at 10:29=E2=80=AFPM Dave Chinner = wrote: > > On Thu, Aug 29, 2024 at 07:55:08AM -0400, Kent Overstreet wrote: > > Ergo, if you're not absolutely sure that a GFP_NOFAIL use is safe > > according to call path and allocation size, you still need to be > > checking for failure - in the same way that you shouldn't be using > > BUG_ON() if you cannot prove that the condition won't occur in real wol= d > > usage. > > We've been using __GFP_NOFAIL semantics in XFS heavily for 30 years > now. This was the default Irix kernel allocator behaviour (it had a > forwards progress guarantee and would never fail allocation unless > told it could do so). We've been using the same "guaranteed not to > fail" semantics on Linux since the original port started 25 years > ago via open-coded loops. > > IOWs, __GFP_NOFAIL semantics have been production tested for a > couple of decades on Linux via XFS, and nobody here can argue that > XFS is unreliable or crashes in low memory scenarios. __GFP_NOFAIL > as it is used by XFS is reliable and lives up to the "will not fail" > guarantee that it is supposed to have. > > Fundamentally, __GFP_NOFAIL came about to replace the callers doing > > do { > p =3D kmalloc(size); > while (!p); > > so that they blocked until memory allocation succeeded. The call > sites do not check for failure, because -failure never occurs-. > > The MM devs want to have visibility of these allocations - they may > not like them, but having __GFP_NOFAIL means it's trivial to audit > all the allocations that use these semantics. IOWs, __GFP_NOFAIL > was created with an explicit guarantee that it -will not fail- for > normal allocation contexts so it could replace all the open-coded > will-not-fail allocation loops.. > > Given this guarantee, we recently removed these historic allocation > wrapper loops from XFS, and replaced them with __GFP_NOFAIL at the > allocation call sites. There's nearly a hundred memory allocation > locations in XFS that are tagged with __GFP_NOFAIL. > > If we're now going to have the "will not fail" guarantee taken away > from __GFP_NOFAIL, then we cannot use __GFP_NOFAIL in XFS. Nor can > it be used anywhere else that a "will not fail" guarantee it > required. > > Put simply: __GFP_NOFAIL will be rendered completely useless if it > can fail due to external scoped memory allocation contexts. This > will force us to revert all __GFP_NOFAIL allocations back to > open-coded will-not-fail loops. > > This is not a step forwards for anyone. Hello Dave, I've noticed that XFS has increasingly replaced kmem_alloc() with __GFP_NOFAIL. For example, in kernel 4.19.y, there are 0 instances of __GFP_NOFAIL under fs/xfs, but in kernel 6.1.y, there are 41 occurrences. In kmem_alloc(), there's an explicit memalloc_retry_wait() to throttle the allocator under heavy memory pressure, which aligns with your filesystem design. However, using __GFP_NOFAIL removes this throttling mechanism, potentially causing issues when the system is under heavy memory load. I'm concerned that this shift might not be a beneficial trend. We have been using XFS for our big data servers for years, and it has consistently performed well with older kernels like 4.19.y. However, after upgrading all our servers from 4.19.y to 6.1.y over the past two years, we have frequently encountered livelock issues caused by memory exhaustion. To mitigate this, we've had to limit the RSS of applications, which isn't an ideal solution and represents a worrying trend. --=20 Regards Yafang