From: Kent Overstreet <kent.overstreet@linux.dev>
To: Dave Chinner <david@fromorbit.com>
Cc: Michal Hocko <mhocko@kernel.org>,
Andrew Morton <akpm@linux-foundation.org>,
Christoph Hellwig <hch@lst.de>,
Yafang Shao <laoar.shao@gmail.com>,
jack@suse.cz, Christian Brauner <brauner@kernel.org>,
Alexander Viro <viro@zeniv.linux.org.uk>,
Paul Moore <paul@paul-moore.com>,
James Morris <jmorris@namei.org>,
"Serge E. Hallyn" <serge@hallyn.com>,
linux-fsdevel@vger.kernel.org, linux-mm@kvack.org,
linux-bcachefs@vger.kernel.org,
linux-security-module@vger.kernel.org,
linux-kernel@vger.kernel.org, Michal Hocko <mhocko@suse.com>
Subject: Re: [PATCH 1/2 v2] bcachefs: do not use PF_MEMALLOC_NORECLAIM
Date: Sun, 1 Sep 2024 21:35:30 -0400 [thread overview]
Message-ID: <nawltogcoffous3zv4kd2eerrrwhihbulz7pi2qyfjvslp6g3f@j3qkqftra2qm> (raw)
In-Reply-To: <ZtUFaq3vD+zo0gfC@dread.disaster.area>
On Mon, Sep 02, 2024 at 10:23:06AM GMT, Dave Chinner wrote:
> On Thu, Aug 29, 2024 at 09:32:52AM -0400, Kent Overstreet wrote:
> > And more than that, this is coming from you saying "We didn't have
> > to handle memory allocation failures in IRIX, why can't we be like
> > IRIX? All those error paths are a pain to test, why can't we get
> > rid of them?"
> >
> You're not listening, Kent. We are not eliding error paths because
> they aren't (or cannot be) tested.
>
> It's a choice between whether a transient error (e.g. ENOMEM) should
> be considered a fatal error or not. The architectural choice that
> was made for XFS back in the 1990s was that the filesystem should
> never fail when transient errors occur. The choice was to wait for
> the transient error to go away and then continue on. The rest of the
> filesystem was build around these fundamental behavioural choices.
<snipped>
> Hence failing an IO and shutting down the filesystem because there
> are transient errors occuring in either the storage or the OS was
> absolutely the wrong thing to be doing. It still is the wrong thing
> to be doing - we want to wait until the transient error has
> progressed to being classified as a permanent error before we take
> drastic action like denying service to the filesystem.
Two different things are getting conflated here.
I'm saying forcefully that __GFP_NOFAIL isn't a good design decision,
and it's bad for system behaviour when we're thrashing, because you,
willy and others have been talking openly about making something like
that the norm, and that concerns me.
I'm _not_ saying that you should have to rearchitect your codebase to
deal with transient -ENOMEMs... that's clearly not going to happen.
But I am saying that kmalloc(__GFP_NOFAIL) _should_ fail and return NULL
in the case of bugs, because that's going to be an improvement w.r.t.
system robustness, in exactly the same way we don't use BUG_ON() if it's
something that we can't guarantee won't happen in the wild - we WARN()
and try to handle the error as best we can.
If we could someday get PF_MEMALLOC_NORECLAIM annotated everywhere we're
not in a sleepable context (or more likely, teach kmalloc() about
preempt count) - that turns a whole class of "scheduling while atomic"
bugs into recoverable errors. That'd be a good thing.
In XFS's case, the only thing you'd ever want to do on failure of a
__GFP_NOFAIL allocation is do an emergency shutdown - the code is buggy,
and those bugs tend to be quickly found and fixed (even quicker when the
system stays usable and the user can collect a backtrace).
> > Except that's bullshit; at the very least any dynamically sized
> > allocation _definitely_ has to have an error path that's tested, and if
> > there's questions about the context a code path might run in, that
> > that's another reason.
>
> We know the context we run __GFP_NOFAIL allocations in - transaction
> context absolutely requires a task context because we take sleeping
> locks, submit and wait on IO, do blocking memory allocation, etc. We
> also know the size of the allocations because we've bounds checked
> everything before we do an allocation.
*nod*
> Hence this argument of "__GFP_NOFAIL aboslutely requires error
> checking because an invalid size or wonrg context might be used"
> is completely irrelevant to XFS. If you call into the filesytsem
> from an atomic context, you've lost long before we get to memory
> allocation because filesystems take sleeping locks....
Yeah, if you know the context of your allocation and you have bounds on
the allocation size that's all fine - that's your call.
_As a general policy_, I will say that even __GFP_NOFAIL allocations
should have error paths - becuase lots of allocations have sizes that
userspace can control, so it becomes a security issue and we need to be
careful what we tell people.
> > GFP_NOFAIL is the problem here, and if it's encouraging this brain
> > damaged "why can't we just get rid of error paths?" thinking, then it
> > should be removed.
> >
> > Error paths have to exist, and they have to be tested.
>
> __GFP_NOFAIL is *not the problem*, and we are not "avoiding error
> handling". Backing off, looping and trying again is a valid
> mechanism for handling transient failure conditions. Having a flag
> that tells the allocator to "backoff, loop and try again" is a
> perfectly good way of providing a generic error handling mechanism.
But we do have to differentiate between transient allocation failures
and failures that are a result of bugs. Because just looping means a
buggy allocation call becomes, at best, a hung task you can't kill.
next prev parent reply other threads:[~2024-09-02 1:35 UTC|newest]
Thread overview: 78+ messages / expand[flat|nested] mbox.gz Atom feed top
2024-08-26 8:47 [PATCH 0/2] get rid of PF_MEMALLOC_NORECLAIM Michal Hocko
2024-08-26 8:47 ` [PATCH 1/2] bcachefs: do not use PF_MEMALLOC_NORECLAIM Michal Hocko
2024-08-26 13:11 ` Matthew Wilcox
2024-08-26 16:48 ` Michal Hocko
2024-08-26 19:39 ` Kent Overstreet
2024-08-26 19:41 ` Matthew Wilcox
2024-08-26 19:42 ` Kent Overstreet
2024-08-26 19:47 ` Matthew Wilcox
2024-08-26 19:54 ` Kent Overstreet
2024-08-26 19:44 ` Kent Overstreet
2024-08-26 19:58 ` Michal Hocko
2024-08-26 20:00 ` Kent Overstreet
2024-08-26 20:27 ` Michal Hocko
2024-08-26 20:43 ` Kent Overstreet
2024-08-26 21:10 ` Kent Overstreet
2024-08-27 6:01 ` Michal Hocko
2024-08-27 6:40 ` Kent Overstreet
2024-08-27 6:58 ` Michal Hocko
2024-08-27 7:05 ` Kent Overstreet
2024-08-27 7:35 ` Michal Hocko
2024-08-26 19:52 ` kernel test robot
2024-08-26 20:53 ` kernel test robot
2024-08-27 2:23 ` kernel test robot
2024-08-27 6:15 ` [PATCH 1/2 v2] " Michal Hocko
2024-08-27 12:29 ` Christoph Hellwig
2024-08-28 4:09 ` Dave Chinner
2024-08-29 10:02 ` Kent Overstreet
2024-08-29 13:12 ` Dave Chinner
2024-08-29 13:22 ` Kent Overstreet
2024-08-29 13:32 ` Kent Overstreet
2024-08-29 14:03 ` Yafang Shao
2024-09-02 0:23 ` Dave Chinner
2024-09-02 1:35 ` Kent Overstreet [this message]
2024-09-02 8:41 ` Michal Hocko
2024-09-02 8:52 ` Kent Overstreet
2024-09-02 9:39 ` Michal Hocko
2024-09-02 9:51 ` Kent Overstreet
2024-09-02 14:07 ` Jonathan Corbet
2024-09-04 18:01 ` Shuah Khan
2024-11-20 20:34 ` Kent Overstreet
2024-11-20 21:12 ` Shuah Khan
2024-11-20 21:20 ` Kent Overstreet
2024-11-20 21:37 ` Shuah Khan
2024-11-20 22:21 ` Shuah Khan
2024-11-20 22:39 ` Kent Overstreet
2024-11-20 22:55 ` Kent Overstreet
2024-11-20 23:21 ` Kent Overstreet
2024-11-20 23:47 ` Theodore Ts'o
2024-11-20 23:57 ` Kent Overstreet
2024-11-21 0:10 ` Kent Overstreet
2024-11-21 4:25 ` Christoph Hellwig
2024-11-21 4:53 ` Kent Overstreet
2024-11-21 23:53 ` Shuah Khan
2024-11-22 6:51 ` Kent Overstreet
2024-11-22 12:06 ` Christoph Hellwig
2024-11-21 21:32 ` Martin Steigerwald
2024-11-22 9:47 ` Geert Uytterhoeven
2024-11-21 5:51 ` Kent Overstreet
2024-11-21 8:43 ` review process (was: underalated stuff) Michal Hocko
2024-11-21 9:03 ` Kent Overstreet
2024-11-21 20:17 ` [PATCH 1/2 v2] bcachefs: do not use PF_MEMALLOC_NORECLAIM Simona Vetter
2024-11-21 21:26 ` Martin Steigerwald
2024-11-22 21:48 ` Dan Williams
2024-11-22 22:02 ` Kent Overstreet
2024-08-29 9:37 ` [PATCH 1/2] " Jan Kara
2024-08-26 8:47 ` [PATCH 2/2] mm: drop PF_MEMALLOC_NORECLAIM Michal Hocko
2024-08-26 13:48 ` Yafang Shao
2024-08-26 16:54 ` Michal Hocko
2024-08-26 13:59 ` Matthew Wilcox
2024-08-26 16:51 ` Michal Hocko
2024-08-26 17:49 ` Matthew Wilcox
2024-08-26 19:18 ` Michal Hocko
2024-08-26 19:20 ` Matthew Wilcox
2024-08-28 4:11 ` Dave Chinner
2024-08-29 21:45 ` Vlastimil Babka
2024-08-26 19:04 ` Kent Overstreet
2024-08-27 12:29 ` Christoph Hellwig
2025-10-01 11:29 [PATCH 1/2 v2] bcachefs: do not use PF_MEMALLOC_NORECLAIM Stephen Whitten
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=nawltogcoffous3zv4kd2eerrrwhihbulz7pi2qyfjvslp6g3f@j3qkqftra2qm \
--to=kent.overstreet@linux.dev \
--cc=akpm@linux-foundation.org \
--cc=brauner@kernel.org \
--cc=david@fromorbit.com \
--cc=hch@lst.de \
--cc=jack@suse.cz \
--cc=jmorris@namei.org \
--cc=laoar.shao@gmail.com \
--cc=linux-bcachefs@vger.kernel.org \
--cc=linux-fsdevel@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=linux-security-module@vger.kernel.org \
--cc=mhocko@kernel.org \
--cc=mhocko@suse.com \
--cc=paul@paul-moore.com \
--cc=serge@hallyn.com \
--cc=viro@zeniv.linux.org.uk \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox