From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id CF99DC48297 for ; Mon, 12 Feb 2024 04:35:40 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 44EA06B0072; Sun, 11 Feb 2024 23:35:40 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 3FE316B0074; Sun, 11 Feb 2024 23:35:40 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 2C66D6B0075; Sun, 11 Feb 2024 23:35:40 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0015.hostedemail.com [216.40.44.15]) by kanga.kvack.org (Postfix) with ESMTP id 1BB516B0072 for ; Sun, 11 Feb 2024 23:35:40 -0500 (EST) Received: from smtpin18.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay08.hostedemail.com (Postfix) with ESMTP id A5B9014022D for ; Mon, 12 Feb 2024 04:35:39 +0000 (UTC) X-FDA: 81781888398.18.E84C20C Received: from mail-pl1-f170.google.com (mail-pl1-f170.google.com [209.85.214.170]) by imf16.hostedemail.com (Postfix) with ESMTP id D1585180003 for ; Mon, 12 Feb 2024 04:35:37 +0000 (UTC) Authentication-Results: imf16.hostedemail.com; dkim=pass header.d=fromorbit-com.20230601.gappssmtp.com header.s=20230601 header.b=pzqAPCWF; dmarc=pass (policy=quarantine) header.from=fromorbit.com; spf=pass (imf16.hostedemail.com: domain of david@fromorbit.com designates 209.85.214.170 as permitted sender) smtp.mailfrom=david@fromorbit.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1707712538; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=PXkXLwVNWLT3J3oeKK9tqqTL6F+hQb3SLALc77jiL0g=; b=NaETDNOJjzNiFjP2F3UqbIy2qN4rdaTH7VzdRFz9v+e/2BiD6MR2RNzCoTzf4f/b2Durn4 LQd/ddVbDyEtbtf7RKI59f8olsIf+vVLDugU+cTMdnEQTaC3csrpelwJo0pEawltFH+XLd 37q3bv1XQTxNEEKWSfL9r4DVWxvH4fA= ARC-Authentication-Results: i=1; imf16.hostedemail.com; dkim=pass header.d=fromorbit-com.20230601.gappssmtp.com header.s=20230601 header.b=pzqAPCWF; dmarc=pass (policy=quarantine) header.from=fromorbit.com; spf=pass (imf16.hostedemail.com: domain of david@fromorbit.com designates 209.85.214.170 as permitted sender) smtp.mailfrom=david@fromorbit.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1707712538; a=rsa-sha256; cv=none; b=eoquBlQwl1FsNpPyKTgwJ6KOsfSP8+RYloramVqldBYyl32XLZ/cfZjCDKElAQu09hONlS REb1qJzMt9I4c/9zQtoWDrlBu7pw+zAxigz8NJraYcdUc1V3diH2+HKj8pSsf8dTKioja2 cDxYpNhWmRQ8h+4v+1461c/6uqXaqSU= Received: by mail-pl1-f170.google.com with SMTP id d9443c01a7336-1da0cd9c0e5so19801865ad.0 for ; Sun, 11 Feb 2024 20:35:37 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=fromorbit-com.20230601.gappssmtp.com; s=20230601; t=1707712536; x=1708317336; darn=kvack.org; h=in-reply-to:content-disposition:mime-version:references:message-id :subject:cc:to:from:date:from:to:cc:subject:date:message-id:reply-to; bh=PXkXLwVNWLT3J3oeKK9tqqTL6F+hQb3SLALc77jiL0g=; b=pzqAPCWFPGxKj96kOVRis6yQCECFOfO+VSvH+kd/tpFaHceIlKDVriKxO4VYOrze6r 9jNtgIP4vhOGW8E5tV+FhYtvLAZmsTK+Ok8gZSD6Tw6HkNh339yIxYAly13fB97nQIKl vgqMaylXIcNAJl+hVIHaixs78o14OZKYuiwwyLoM8q/fZHS8I57UV2mhSSOTcV5198/M Il31ccpi1JMJ/KN8YB5q1Qag0oYju+UMuhacDUTfBVdKv64q3di0HoDcBFP7zmxN99gC /KmxH2o1zA1aTuYnLsQ9CQ97YYWHDSGg5g1MnyG2yHWa0PQ+LmA7DeAEIcHwF6/bgBsi k9KQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1707712536; x=1708317336; h=in-reply-to:content-disposition:mime-version:references:message-id :subject:cc:to:from:date:x-gm-message-state:from:to:cc:subject:date :message-id:reply-to; bh=PXkXLwVNWLT3J3oeKK9tqqTL6F+hQb3SLALc77jiL0g=; b=w6IInpQqkgkRMeAaCwqmImpnsuRMg5ItWpMIf1qDcKHhifqyKOHmyUTlK0Dm7iwYst degFXJILjLeFPhMqWklhe8i04QFkZ6UdWWnApvgf8rpTpWanChAcfTAqOp50jbPdTTKb ntLtU6Nb15cZhjcMhnTCVZeGtNVBUaR0iXXPpzXzyWsIs1zb6RAH1/0d4zsBW2zK63TW 8AT/JTTQZ1s3KyFCw5moc9dWy381m0UZPrsmQyzHCk61xMWnlJwtiYi4sR6VmEq9ft6I qbRUzB3mD0dOIgtP7qN+vjyugv9DUk0Aa+ZmKFihNetXHJwpiNDTqYGrzNipWKeoEjvH GZOQ== X-Gm-Message-State: AOJu0YyaiC+AUuU2kEPSWcw4vXo69geJausNTK5AmWjZ8F6XGSfKTNOV T4m79nK0dgnHPxW0nrWp0q7I3MM+OgpJjJ1CHrkNHbgekwKUqzQbiZDUtxLKCww= X-Google-Smtp-Source: AGHT+IEiPpX2ipLt/DqNdet7IkGo1IKBe0NX+8bKLXh5WQVjqaXPj6kfmFWvz6lYLEhiD9w85rSD3g== X-Received: by 2002:a17:903:489:b0:1d9:6fce:54f7 with SMTP id jj9-20020a170903048900b001d96fce54f7mr8875338plb.9.1707712536495; Sun, 11 Feb 2024 20:35:36 -0800 (PST) X-Forwarded-Encrypted: i=1; AJvYcCUqYigSqkzKjzntBWqHqJJRytUl2NfL/vuX6pRjrmsQiEdDZC6LOzD5N/+F3JdZ6t8/ESb6pY/jI37xatgAKqn1PqKpmR27R/KQov8SF2WTizs8/uv+tGT/BAswJfQHF7PW+f22zOBB+yG6J0VPNhEA7p359+ysXO8hFiZbbltS9UC/SW6ZCNsT9KjxGynYyLtL6ueQYbXS2vAgIq+10rKYLKNH3h98EIMCXbUK5oDEO0tzUweEBzUkl2KodVSnykYRMsiv8tDEWYDAnKkvIvE//ZH0BIftykn8uou80tWY1rJmuzsJ8tO6VjDaOCNEsyXVSciLPxV/UkzlkwXI9npFhCoYLXcNgeVAO5EKUDKQtvpAcJVXhFgucEcMzd0RIj1J0z6vfy/CPGjeBDuqt2SFcO8avI1+92di Received: from dread.disaster.area (pa49-181-38-249.pa.nsw.optusnet.com.au. [49.181.38.249]) by smtp.gmail.com with ESMTPSA id kf13-20020a17090305cd00b001d8f393f3cfsm4968659plb.248.2024.02.11.20.35.35 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Sun, 11 Feb 2024 20:35:36 -0800 (PST) Received: from dave by dread.disaster.area with local (Exim 4.96) (envelope-from ) id 1rZO2X-005Lcx-1m; Mon, 12 Feb 2024 15:35:33 +1100 Date: Mon, 12 Feb 2024 15:35:33 +1100 From: Dave Chinner To: Kent Overstreet Cc: "Vlastimil Babka (SUSE)" , Michal Hocko , Matthew Wilcox , lsf-pc@lists.linux-foundation.org, linux-fsdevel@vger.kernel.org, linux-mm@kvack.org, linux-block@vger.kernel.org, linux-ide@vger.kernel.org, linux-scsi@vger.kernel.org, linux-nvme@lists.infradead.org, Kent Overstreet Subject: Re: [LSF/MM/BPF TOPIC] Removing GFP_NOFS Message-ID: References: <3ba0dffa-beea-478f-bb6e-777b6304fb69@kernel.org> <3aa399bb-5007-4d12-88ae-ed244e9a653f@kernel.org> <5p4zwxtfqwm3wgvzwqfg6uwy5m3lgpfypij4fzea63gu67ve4t@77to5kukmiic> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <5p4zwxtfqwm3wgvzwqfg6uwy5m3lgpfypij4fzea63gu67ve4t@77to5kukmiic> X-Rspamd-Server: rspam09 X-Rspamd-Queue-Id: D1585180003 X-Stat-Signature: fnionrunt8c8qi5ap4khe9ssa4dt38pa X-Rspam-User: X-HE-Tag: 1707712537-347950 X-HE-Meta: U2FsdGVkX19g4V5mW5X17UBqsEnABtVQueV3D4N122Cd54V/RIQqw72FFrFWgal1gPa4rbnVyf2fdDUjC+81R8mpFuUiAqSm345rf201UvmZWAavxrGCnGZV/yaV3qEYvwvN1nfFsqWErYks0CQxyR3eVz53m8tZg4HaHLcZ7Lm+d0IHNO8UwoD6MeMVVQIDwvTc4fwS9E64q5+cQmOCkLcuiAskkObSb9twQoGR80LMUGL0kmgSmocmqidR5CeB7/Srn6wdsIAtsHObEcCdMzF85KYJXRmI87UwbFeDtrrR4af3NmjCcrgoQEs77FodT0ZEY08qxulXixzYf243FeAhHLb46fPQ3F5Eq3MdC94VTaZ3I0woMgV7LRhfcfuuIFn/HreoHNymypK1TwfbXdrnvI683uetR/zAiOr70E8r3CDV/kYLSmS1gktwMVGWiUzhklY0gnwB9NKaiWN5TsB4jVKPpKmibseNAK/6jQ1KYpbnwQow8S5RrEemJG7DcqyZ8QQnmWhEOjFLZLJPa9DYjdWKwlsmm1Cu+EUfxmD4QE807P/IAZRwaEvDKVZg9TlFaqbnwVrKfeB0yzk7zFHNuvsgn2gn4K7ZrUJgmnc4YSSO/G+zdIU2yIm1Yk0zOGh/yXV4kjAYF1hMKmSmE+UM0QLSeCOww7Zu3gm8F/Fzxm1ysymdWsc+VPpyRWRhbjstIQgcfjoIXkkKe+uWyDno1EWUFeA+IAiuNVUPqaaLBfGo6ALyO+UgCn0CXSSGqkhWdd8wK/qA57IcMnMffalQAjgBEPwI+dSQWlPccqm+xqbi7Sdr7gPYNdyeQTT46aVg62yhBvfryh4V/vPvGZTTDz2IELEhVyPUgs4yUFcH6EJHyi7EdWcKhImCx6MF2CD5nVBHUMaGZlW9V2zphIOD8JxGCKyX4B7scS21tznAAgSLgsWGqs4Fhyt9FUdf6LLBS/lXxPk+g8fGIk6 XRXXGZtH 0sNgYK4jGI2jIBihs2ijL8SFypN1kYLW+r35Bz22oDp9ZCqn5nH5sjVPNtXR+eU2+LQceBdkNQmC8xcRcEAcN8qjlBROFRhgEpCEgTsRAzxIMeq77lJX02405lTuPsdJix1+XTE68F0BizyLOARG+pq6GS5Dg4dcyyjOkoTUWUmmkvXLy4j7RZxULAkg3jKMwbuo0YhbwlcBCHFCLRPLmCbiB3RJ6+125MgB8kXtAZR1yXEo= X-Bogosity: Ham, tests=bogofilter, spamicity=0.000008, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On Sun, Feb 11, 2024 at 09:06:33PM -0500, Kent Overstreet wrote: > On Mon, Feb 12, 2024 at 12:20:32PM +1100, Dave Chinner wrote: > > On Thu, Feb 08, 2024 at 08:55:05PM +0100, Vlastimil Babka (SUSE) wrote: > > > On 2/8/24 18:33, Michal Hocko wrote: > > > > On Thu 08-02-24 17:02:07, Vlastimil Babka (SUSE) wrote: > > > >> On 1/9/24 05:47, Dave Chinner wrote: > > > >> > On Thu, Jan 04, 2024 at 09:17:16PM +0000, Matthew Wilcox wrote: > > > >> > > > >> Your points and Kent's proposal of scoped GFP_NOWAIT [1] suggests to me this > > > >> is no longer FS-only topic as this isn't just about converting to the scoped > > > >> apis, but also how they should be improved. > > > > > > > > Scoped GFP_NOFAIL context is slightly easier from the semantic POV than > > > > scoped GFP_NOWAIT as it doesn't add a potentially unexpected failure > > > > mode. It is still tricky to deal with GFP_NOWAIT requests inside the > > > > NOFAIL scope because that makes it a non failing busy wait for an > > > > allocation if we need to insist on scope NOFAIL semantic. > > > > > > > > On the other hand we can define the behavior similar to what you > > > > propose with RETRY_MAYFAIL resp. NORETRY. Existing NOWAIT users should > > > > better handle allocation failures regardless of the external allocation > > > > scope. > > > > > > > > Overriding that scoped NOFAIL semantic with RETRY_MAYFAIL or NORETRY > > > > resembles the existing PF_MEMALLOC and GFP_NOMEMALLOC semantic and I do > > > > not see an immediate problem with that. > > > > > > > > Having more NOFAIL allocations is not great but if you need to > > > > emulate those by implementing the nofail semantic outside of the > > > > allocator then it is better to have those retries inside the allocator > > > > IMO. > > > > > > I see potential issues in scoping both the NOWAIT and NOFAIL > > > > > > - NOFAIL - I'm assuming Dave is adding __GFP_NOFAIL to xfs allocations or > > > adjacent layers where he knows they must not fail for his transaction. But > > > could the scope affect also something else underneath that could fail > > > without the failure propagating in a way that it affects xfs? > > > > Memory allocaiton failures below the filesystem (i.e. in the IO > > path) will fail the IO, and if that happens for a read IO within > > a transaction then it will have the same effect as XFS failing a > > memory allocation. i.e. it will shut down the filesystem. > > > > The key point here is the moment we go below the filesystem we enter > > into a new scoped allocation context with a guaranteed method of > > returning errors: NOIO and bio errors. > > Hang on, you're conflating NOIO to mean something completely different - > NOIO means "don't recurse in reclaim", it does _not_ mean anything about > what happens when the allocation fails, Yes, I know that's what NOIO means. I'm not conflating it with anything else. > and in particular it definitely > does _not_ mean that failing the allocation is going to result in an IO > error. Exactly. FS level NOFAIL contexts simply do not apply to NOIO context functionality. NOIO contexts require different mechanisms to guarantee forwards progress under memory pressure. They work pretty well, and we don't want or need to perturb them by having them inherit filesystem level NOFAIL semantics. i.e. architecturally speaking, NOIO is a completely separate allocation domain to NOFS. > That's because in general most code in the IO path knows how to make > effective use of biosets and mempools (which may take some work! you > have to ensure that you're always able to make forward progress when > memory is limited, and in particular that you don't double allocate from > the same mempool if you're blocking the first allocation from > completing/freeing). Yes, I understand this, and that's my point: NOIO context tends to be able to use mempools and other mechanisms to prevent memory allocation failure, not NOFAIL. The IO layers are request based and that enables one-in, one out allocation pools that can guarantee single IO progress. That's all the IO layers need to guarantee to the filesystems so that forwards progress can always be made until memory pressure. However, filesystems cannot guarantee "one in, one out" allocation behaviour. A transaction can require a largely unbound number of memory allocations to succeed to make progress through to completion, and so things like mempools -cannot be used- to prevent memory allocation failures whilst providing a forwards progress guarantee. Hence a NOFAIL scope if useful at the filesystem layer for filesystem objects to ensure forwards progress under memory pressure, but it is compeltely unnecessary once we transition to the IO layer where forwards progress guarantees ensure memory allocation failures don't impede progress. IOWs, we only need NOFAIL at the NOFS layers, not at the NOIO layers. The entry points to the block layer should transition the task to NOIO context and restore the previous context on exit. Then it becomes relatively trivial to apply context based filtering of allocation behaviour.... > > i.e NOFAIL scopes are not relevant outside the subsystem that sets > > it. Hence we likely need helpers to clear and restore NOFAIL when > > we cross an allocation context boundaries. e.g. as we cross from > > filesystem to block layer in the IO stack via submit_bio(). Maybe > > they should be doing something like: > > > > nofail_flags = memalloc_nofail_clear(); > > NOFAIL is not a scoped thing at all, period; it is very much a > _callsite_ specific thing, and it depends on whether that callsite has a > fallback. *cough* As I've already stated, NOFAIL allocation has been scoped in XFS for the past 20 years. Every memory allocation inside a transaction *must* be NOFAIL unless otherwise specified because memory allocation inside a dirty transaction is a fatal error. However, that scoping has never been passed to the NOIO contexts below the filesytsem - it's scoped purely within the filesystem itself and doesn't pass on to other subsystems the filesystem calls into. > The most obvious example being, as mentioned previously, mempools. Yes, they require one-in, one-out guarantees to avoid starvation and ENOMEM situations. Which, as we've known since mempools were invented, these guarantees cannot be provided by most filesystems. > > > - NOWAIT - as said already, we need to make sure we're not turning an > > > allocation that relied on too-small-to-fail into a null pointer exception or > > > BUG_ON(!page). > > > > Agreed. NOWAIT is removing allocation failure constraints and I > > don't think that can be made to work reliably. Error injection > > cannot prove the absence of errors and so we can never be certain > > the code will always operate correctly and not crash when an > > unexepected allocation failure occurs. > > You saying we don't know how to test code? Yes, that's exactly what I'm saying. I'm also saying that designing algorithms that aren't fail safe is poor design. If you get it wrong and nothing bad can happen as a result, then the design is fine. But if the result of missing something accidentally is that the system is guaranteed to crash when that is hit, then failure is guaranteed and no amount of testing will prevent that failure from occurring. And we suck at testing, so we absolutely need to design fail safe algorithms and APIs... -Dave. -- Dave Chinner david@fromorbit.com